[jira] [Commented] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond

2016-12-21 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769397#comment-15769397
 ] 

Jingcheng Du commented on HBASE-16981:
--

Thanks a lot Huaxiang! I will.

> Expand Mob Compaction Partition policy from daily to weekly, monthly and 
> beyond
> ---
>
> Key: HBASE-16981
> URL: https://issues.apache.org/jira/browse/HBASE-16981
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-16981.master.001.patch, 
> HBASE-16981.master.002.patch, HBASE-16981.master.003.patch, 
> Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf
>
>
> Today the mob region holds all mob files for all regions. With daily 
> partition mob compaction policy, after major mob compaction, there is still 
> one file per region daily. Given there is 365 days in one year, at least 365 
> files per region. Since HDFS has limitation for number of files under one 
> folder, this is not going to scale if there are lots of regions. To reduce 
> mob file number,  we want to introduce other partition policies such as 
> weekly, monthly to compact mob files within one week or month into one file. 
> This jira is create to track this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HBASE-17314:
--
Attachment: HBASE-17314.v05.patch

Let's run a pre-commit test. Will push this patch if nothing wrong. 

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch, 
> HBASE-17314.v05.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Phil Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Yang updated HBASE-17314:
--
Status: Patch Available  (was: Reopened)

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch, 
> HBASE-17314.v05.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17358) Unify backoff calculation

2016-12-21 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17358:
-

 Summary: Unify backoff calculation
 Key: HBASE-17358
 URL: https://issues.apache.org/jira/browse/HBASE-17358
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


For Async table the sleep pause is only determined by the retry number, at 
least we should also take care of the exception(MultiActionResultTooLarge, 
CallQueueTooBig...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17149) Procedure v2 - Fix nonce submission

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769312#comment-15769312
 ] 

stack commented on HBASE-17149:
---

Giving up on backport. Differences are too extreme. You might have better luck 
when you get back [~syuanjiang]

> Procedure v2 - Fix nonce submission
> ---
>
> Key: HBASE-17149
> URL: https://issues.apache.org/jira/browse/HBASE-17149
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
> Fix For: 2.0.0
>
> Attachments: HBASE-17149.master.001.patch, 
> HBASE-17149.master.002.patch, HBASE-17149.master.002.patch, 
> HBASE-17149.master.002.patch, HBASE-17149.master.003.patch, nonce.patch
>
>
> instead of having all the logic in submitProcedure(), split in 
> registerNonce() + submitProcedure().
> In this case we can avoid calling the coprocessor twice and having a clean 
> submit logic knowing that there will only be one submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769305#comment-15769305
 ] 

Phil Yang commented on HBASE-17314:
---

{quote}
This looks like it could be package private rather than public:
2344public ReplicationSourceService getReplicationSourceService() {
{quote}

It is used in 
org.apache.hadoop.hbase.replication.regionserver.TestGlobalThrottler so it can 
only be public.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-17262:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Refactor RpcServer so as to make it extendable and/or pluggable
> ---
>
> Key: HBASE-17262
> URL: https://issues.apache.org/jira/browse/HBASE-17262
> Project: HBase
>  Issue Type: Sub-task
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17262.master.V1.patch, 
> HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, 
> HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-12-21 Thread ChiaPing Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769298#comment-15769298
 ] 

ChiaPing Tsai commented on HBASE-17160:
---

It works for me. Thanks a lot.

> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.addendum.txt, HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769283#comment-15769283
 ] 

binlijin commented on HBASE-17262:
--

The UT failure is unrelated to this patch.

> Refactor RpcServer so as to make it extendable and/or pluggable
> ---
>
> Key: HBASE-17262
> URL: https://issues.apache.org/jira/browse/HBASE-17262
> Project: HBase
>  Issue Type: Sub-task
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17262.master.V1.patch, 
> HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, 
> HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769286#comment-15769286
 ] 

stack commented on HBASE-17314:
---

Go ahead and push patch w/ fix I'd say [~yangzhe1991] when you have one.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769287#comment-15769287
 ] 

binlijin commented on HBASE-17262:
--

Push to master

> Refactor RpcServer so as to make it extendable and/or pluggable
> ---
>
> Key: HBASE-17262
> URL: https://issues.apache.org/jira/browse/HBASE-17262
> Project: HBase
>  Issue Type: Sub-task
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17262.master.V1.patch, 
> HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, 
> HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17334) Add locate row before/after support for AsyncRegionLocator

2016-12-21 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17334:
--
Attachment: HBASE-17334-v2.patch

Add test for locate after.

> Add locate row before/after support for AsyncRegionLocator
> --
>
> Key: HBASE-17334
> URL: https://issues.apache.org/jira/browse/HBASE-17334
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17334-v1.patch, HBASE-17334-v2.patch, 
> HBASE-17334.patch
>
>
> Now we only have a getPreviousRegionLocation method which is only used for 
> reverse scan, and it is not perfect as it can not deal with region merge. As 
> we want to add inclusive/exclusive support for start row and end row of a 
> scan, we need to implement general locate to row before/after method for 
> AsyncRegionLocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769280#comment-15769280
 ] 

Phil Yang commented on HBASE-17314:
---

Thank you. The reason is after HBASE-11392 adding peer is through master so we 
must start a cluster first then add a peer.  Before this it is ok to add peer 
first because it only add a znode on ZK.
{code}
@@ -94,12 +94,13 @@ public class TestGlobalThrottler {
 ReplicationAdmin admin1 = new ReplicationAdmin(conf1);
 ReplicationPeerConfig rpc = new ReplicationPeerConfig();
 rpc.setClusterKey(utility2.getClusterKey());
-admin1.addPeer("peer1", rpc, null);
-admin1.addPeer("peer2", rpc, null);
-admin1.addPeer("peer3", rpc, null);
 
 utility1.startMiniCluster(1, 1);
 utility2.startMiniCluster(1, 1);
+
+admin1.addPeer("peer1", rpc, null);
+admin1.addPeer("peer2", rpc, null);
+admin1.addPeer("peer3", rpc, null);
   }

{code}

Will upload a new patch with your suggests. Thanks.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769235#comment-15769235
 ] 

stack commented on HBASE-17314:
---

[~yangzhe1991] Here boss 
https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/5013/testReport/org.apache.hadoop.hbase.replication.regionserver/TestGlobalThrottler/org_apache_hadoop_hbase_replication_regionserver_TestGlobalThrottler/



> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16859) Use Bytebuffer pool for non java clients specifically for scans/gets

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769229#comment-15769229
 ] 

stack commented on HBASE-16859:
---

Go for it [~ram_krish] (I have no numbers on native vs non-native clients)

> Use Bytebuffer pool for non java clients specifically for scans/gets
> 
>
> Key: HBASE-16859
> URL: https://issues.apache.org/jira/browse/HBASE-16859
> Project: HBase
>  Issue Type: Sub-task
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-16859_V1.patch, HBASE-16859_V2.patch, 
> HBASE-16859_V2.patch, HBASE-16859_V4.patch, HBASE-16859_V5.patch, 
> HBASE-16859_V6.patch
>
>
> In case of non java clients we still write the results and header into a on 
> demand  byte[]. This can be changed to use the BBPool (onheap or offheap 
> buffer?).
> But the basic problem is to identify if the response is for scans/gets. 
> - One easy way to do it is use the MethodDescriptor per Call and use the   
> name of the MethodDescriptor to identify it is a scan/get. But this will 
> pollute RpcServer by checking for scan/get type response.
> - Other way is always set the result to cellScanner but we know that 
> isClientCellBlockSupported is going to false for non PB clients. So ignore 
> the cellscanner and go ahead with the results in PB. But this is not clean
> - third one is that we already have a RpccallContext being passed to the RS. 
> In case of scan/gets/multiGets we already set a Rpccallback for shipped call. 
> So here on response we can check if the callback is not null and check for 
> isclientBlockSupported. In this case we can get the BB from the pool and 
> write the result and header to that BB. May be this looks clean?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769227#comment-15769227
 ] 

Hadoop QA commented on HBASE-17262:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
35s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 50s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 46s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 139m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844345/HBASE-17262.master.V5.patch
 |
| JIRA Issue | HBASE-17262 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 56ca5999b2df 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / d787155 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5019/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/5019/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results

[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Phil Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769210#comment-15769210
 ] 

Phil Yang commented on HBASE-17314:
---

bq. TestGlobalThrottler hangs in master build.

Which build did it hang in? Any logs for the test? Thanks.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17345) Implement batch

2016-12-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769199#comment-15769199
 ] 

Yu Li commented on HBASE-17345:
---

Overall LGTM, some comments:

About {{ConnectionUtils}}:
In {{voidBatch}} and {{voidBatchAll}}, mind explain why {{table. 
batch(actions)}} rather than {{table. batch(actions)}}?

About {{AsyncTableBase}}:
1. Add javadoc for newly added methods: exists(List)/existsAll, 
put(List)/putAll, delete(List)/deleteAll, batch(List)/batchAll?
2. Add more UT cases to cover them?

About {{TestAsyncGetMultiThread}}:
1. Now it's making chaos for each split key, including split-and-compact, 
balance and move, and sleep 5 seconds in between each, which will make the test 
run for over 2 min. Maybe simplify it a little bit to make the test finish 
faster?
2. Also feel the name confusing, maybe TestAsyncGetWithMultiThread is better?

Thanks.

> Implement batch
> ---
>
> Key: HBASE-17345
> URL: https://issues.apache.org/jira/browse/HBASE-17345
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17345.patch
>
>
> Add the support for general batch based on the code introduced in HBASE-17142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17355) Create a simplifed version of flush scanner

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769194#comment-15769194
 ] 

stack commented on HBASE-17355:
---

I like the reduce overhead by 50% story

> Create a simplifed version of flush scanner
> ---
>
> Key: HBASE-17355
> URL: https://issues.apache.org/jira/browse/HBASE-17355
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17354.patch, after patch.png, before patch.png
>
>
> Currently we use StoreScanner for performing the flushes which actuallly goes 
> row by row. Probably that is not needed and we could always go ahead with a 
> simple loop in collecting the cells and writing them to the file. Inside 
> write path we have the required sanity check so it is not needed that the 
> store scanner does a sanity check. 
> Also the limit that could be retrieved in one next() call could be equivalent 
> to the block size configured as we do for compaction.
> Are there any filters that we want to do (i mean any version check or 
> deletion) that we need to check in flush? If so then this simplified version 
> will not work. I may be missing something but if so we need to see what are 
> those and add it here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-12-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-17160:
--
Attachment: HBASE-17160.addendum.txt

This worked for me [~chia7712]. Does it work for you? If so, I'll commit. 
Thanks.

> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.addendum.txt, HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell

2016-12-21 Thread ChiaPing Tsai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769077#comment-15769077
 ] 

ChiaPing Tsai commented on HBASE-17160:
---

I run the "mvn clean test 
-Dtest=org.apache.hadoop.hbase.client.TestRpcControllerFactory -X” for checking 
the classpath, and then i find it includes the 
hbase-hadoop-compat/target/classes but no 
hbase-hadoop-compat/target/test_classes.
The reasons for the lack could be that the transitive dependencies doesn’t 
include the test scope automatically.

> Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to 
> shell
> -
>
> Key: HBASE-17160
> URL: https://issues.apache.org/jira/browse/HBASE-17160
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17160.master.001.patch, 
> HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, 
> minor_hbase.png, untangled_hbase.png
>
>
> Very minor untangling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17355) Create a simplifed version of flush scanner

2016-12-21 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769018#comment-15769018
 ] 

ramkrishna.s.vasudevan commented on HBASE-17355:


I don't think it is simple like in patch. May need some more tweaks but yes we 
can reduce the number of comparisons and reduce the atleast 50% of the overhead 
here.

> Create a simplifed version of flush scanner
> ---
>
> Key: HBASE-17355
> URL: https://issues.apache.org/jira/browse/HBASE-17355
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17354.patch, after patch.png, before patch.png
>
>
> Currently we use StoreScanner for performing the flushes which actuallly goes 
> row by row. Probably that is not needed and we could always go ahead with a 
> simple loop in collecting the cells and writing them to the file. Inside 
> write path we have the required sanity check so it is not needed that the 
> store scanner does a sanity check. 
> Also the limit that could be retrieved in one next() call could be equivalent 
> to the block size configured as we do for compaction.
> Are there any filters that we want to do (i mean any version check or 
> deletion) that we need to check in flush? If so then this simplified version 
> will not work. I may be missing something but if so we need to see what are 
> those and add it here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17101) FavoredNodes should not apply to system tables

2016-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769016#comment-15769016
 ] 

Hadoop QA commented on HBASE-17101:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
48s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
41s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 14s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 11s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 126m 0s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844340/HBASE-17101.master.003.patch
 |
| JIRA Issue | HBASE-17101 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 76537c1822ac 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d787155 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5018/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5018/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-17101.master.001.patch, 
> HBASE-17101.master.002.

[jira] [Commented] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond

2016-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768994#comment-15768994
 ] 

Hadoop QA commented on HBASE-16981:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} rubocop {color} | {color:blue} 0m 0s 
{color} | {color:blue} rubocop was not available. {color} |
| {color:blue}0{color} | {color:blue} ruby-lint {color} | {color:blue} 0m 0s 
{color} | {color:blue} Ruby-lint was not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
34s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
11s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m 17s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s 
{color} | {color:red} hbase-server generated 1 new + 1 unchanged - 0 fixed = 2 
total (was 1) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 8s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 97m 22s 
{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 33s 
{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
39s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 158m 34s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844330/HBASE-16981.master.003.patch
 |
| JIRA Issue | HBASE-16981 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  rubocop  ruby_lint  |
| uname | Linux 13a0b7784fe1 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 
21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component

[jira] [Updated] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-17262:
-
Component/s: (was: Performance)

> Refactor RpcServer so as to make it extendable and/or pluggable
> ---
>
> Key: HBASE-17262
> URL: https://issues.apache.org/jira/browse/HBASE-17262
> Project: HBase
>  Issue Type: Sub-task
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17262.master.V1.patch, 
> HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, 
> HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-17262:
-
Attachment: HBASE-17262.master.V5.patch

> Refactor RpcServer so as to make it extendable and/or pluggable
> ---
>
> Key: HBASE-17262
> URL: https://issues.apache.org/jira/browse/HBASE-17262
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance, rpc
>Affects Versions: 2.0.0
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0
>
> Attachments: HBASE-17262.master.V1.patch, 
> HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, 
> HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17101) FavoredNodes should not apply to system tables

2016-12-21 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768796#comment-15768796
 ] 

Thiruvel Thirumoolan commented on HBASE-17101:
--

Updated patch with review comments from reviewboard addressed.

> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-17101.master.001.patch, 
> HBASE-17101.master.002.patch, HBASE-17101.master.003.patch, 
> HBASE_17101_rough_draft.patch
>
>
> As described in the doc (see HBASE-15531), we would like to start with user 
> tables for favored nodes. This task ensures FN does not apply to system 
> tables.
> System tables are in memory and won't benefit from favored nodes. Since we 
> also maintain FN information for user regions in meta, it helps to keep 
> implementation simpler by ignoring system tables for the first iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17101) FavoredNodes should not apply to system tables

2016-12-21 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HBASE-17101:
-
Attachment: HBASE-17101.master.003.patch

> FavoredNodes should not apply to system tables
> --
>
> Key: HBASE-17101
> URL: https://issues.apache.org/jira/browse/HBASE-17101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
> Fix For: 2.0.0
>
> Attachments: HBASE-17101.master.001.patch, 
> HBASE-17101.master.002.patch, HBASE-17101.master.003.patch, 
> HBASE_17101_rough_draft.patch
>
>
> As described in the doc (see HBASE-15531), we would like to start with user 
> tables for favored nodes. This task ensures FN does not apply to system 
> tables.
> System tables are in memory and won't benefit from favored nodes. Since we 
> also maintain FN information for user regions in meta, it helps to keep 
> implementation simpler by ignoring system tables for the first iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768767#comment-15768767
 ] 

Hudson commented on HBASE-17341:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #82 (See 
[https://builds.apache.org/job/HBase-1.2-JDK7/82/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: 
rev 18dc7386bc9adff834db851a38306989fb3fd4a6)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17257) Add column-aliasing capability to hbase-client

2016-12-21 Thread Daniel Vimont (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768757#comment-15768757
 ] 

Daniel Vimont commented on HBASE-17257:
---

Patch v5 now available for perusal on Review Board: 
https://reviews.apache.org/r/54635/

> Add column-aliasing capability to hbase-client
> --
>
> Key: HBASE-17257
> URL: https://issues.apache.org/jira/browse/HBASE-17257
> Project: HBase
>  Issue Type: New Feature
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Daniel Vimont
>Assignee: Daniel Vimont
>  Labels: features
> Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, 
> HBASE-17257-v4.patch, HBASE-17257-v5.patch, HBASE-17257.patch
>
>
> Review Board link: https://reviews.apache.org/r/54635/
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to 
> be stored in each cell of an "alias enabled" column-family, in place of the 
> full-length column-qualifier. Aliasing is intended to operate completely 
> invisibly to the end-user developer, with absolutely no "awareness" of 
> aliasing required to be coded into a front-end application. No new public 
> hbase-client interfaces are to be introduced, and only a few new public 
> methods should need to be added to existing interfaces, primarily to allow an 
> administrator to designate that a new column-family is to be alias-enabled by 
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable, 
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
> methods of these new subclasses will invoke methods of the new AliasManager 
> class to facilitate qualifier-to-alias conversions (for user-submitted Gets, 
> Scans, and Mutations) and alias-to-qualifier conversions (for Results 
> returned from HBase) for any Table that has one or more alias-enabled column 
> families. All conversion logic will be encapsulated in the new AliasManager 
> class, and all qualifier-to-alias mappings will be persisted in a new 
> aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the 
> Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing 
> could be a popular enhancement to standard HBase functionality, due to the 
> fact that full column-qualifiers are stored in each cell, and reducing this 
> qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove 
> beneficial in terms of reduced storage and bandwidth needs. Aliasing is 
> intended chiefly for column-families which are of the "narrow and tall" 
> variety (i.e., that are designed to use relatively few distinct 
> column-qualifiers throughout a large number of rows, throughout the lifespan 
> of the column-family). A column-family that is set up with an alias-size of 1 
> byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size 
> allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size 
> allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note 
> that it may well not be viable to add aliasing support in the new "async" 
> classes that appear to be currently under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15130) Backport 0.98 Scan different TimeRange for each column family

2016-12-21 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-15130:
---
Fix Version/s: (was: 0.98.24)

> Backport 0.98 Scan different TimeRange for each column family 
> --
>
> Key: HBASE-15130
> URL: https://issues.apache.org/jira/browse/HBASE-15130
> Project: HBase
>  Issue Type: Bug
>  Components: Client, regionserver, Scanners
>Affects Versions: 0.98.17
>Reporter: churro morales
>Assignee: churro morales
> Attachments: HBASE-15130-0.98.patch, HBASE-15130-0.98.v1.patch, 
> HBASE-15130-0.98.v1.patch, HBASE-15130-0.98.v2.patch, 
> HBASE-15130-0.98.v3.patch, HBASE-15130-0.98.v4.patch
>
>
> branch 98 version backport for HBASE-14355



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16663) JMX ConnectorServer stopped when unauthorized user try to stop HM/RS/cluster

2016-12-21 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-16663:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> JMX ConnectorServer stopped when unauthorized user try to stop HM/RS/cluster
> 
>
> Key: HBASE-16663
> URL: https://issues.apache.org/jira/browse/HBASE-16663
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, security
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.9, 0.98.24, 1.2.4
>
> Attachments: 16663-branch-1.1.00.patch, 16663.branch-1.1.patch, 
> 16663.branch-1.1.patch, HBASE-16663-0.98-V4.patch, HBASE-16663-0.98.patch, 
> HBASE-16663-V2.patch, HBASE-16663-V3.patch, HBASE-16663-V4.patch, 
> HBASE-16663-branch-1.patch, HBASE-16663.patch
>
>
> After HBASE-16284, unauthorized user will not able allowed to stop 
> HM/RS/cluster, but while executing "cpHost.preStopMaster()", ConnectorServer 
> will be stopped before AccessController validation.
> hbase-site.xml,
> {noformat}
>  
>   hbase.coprocessor.master.classes
> 
> org.apache.hadoop.hbase.JMXListener,org.apache.hadoop.hbase.security.access.AccessController
>  
>   
>   hbase.coprocessor.regionserver.classes
> 
> org.apache.hadoop.hbase.JMXListener,org.apache.hadoop.hbase.security.access.AccessController
>   
> {noformat}
> HBaseAdmin.stopMaster(),
> {noformat}
> 2016-09-20 21:12:26,796 INFO  
> [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16000] 
> hbase.JMXListener: ConnectorServer stopped!
> 2016-09-20 21:13:55,380 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16000] 
> security.ShellBasedUnixGroupsMapping: got exception trying to get groups for 
> user P72981
> ExitCodeException exitCode=1: id: P72981: No such user
> 2016-09-20 21:14:00,495 ERROR 
> [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16000] 
> master.MasterRpcServices: Exception occurred while stopping master
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'P72981' (global, action=ADMIN)
>   at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:546)
>   at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
>   at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopMaster(AccessController.java:1297)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$68.call(MasterCoprocessorHost.java:821)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1188)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.preStopMaster(MasterCoprocessorHost.java:817)
>   at org.apache.hadoop.hbase.master.HMaster.stopMaster(HMaster.java:2352)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.stopMaster(MasterRpcServices.java:1364)
> {noformat}
> HBaseAdmin.stopRegionServer(rs-host-port),
> {noformat}
> 2016-09-20 20:59:01,234 INFO  
> [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
> hbase.JMXListener: ConnectorServer stopped!
> 2016-09-20 20:59:01,250 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
> security.ShellBasedUnixGroupsMapping: got exception trying to get groups for 
> user P72981
> ExitCodeException exitCode=1: id: P72981: No such user
> 2016-09-20 20:59:01,253 WARN  
> [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
> regionserver.HRegionServer: The region server did not stop
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions for user 'P72981' (global, action=ADMIN)
>   at 
> org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:546)
>   at 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522)
>   at 
> org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:84)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execOperation(RegionServerCoprocessorHost.java:256)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:80)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.stopServer(RSRpcServices.java:1961)
> {noformat}
> HBaseAdmin.shutdown(),
> {noformat}
> 2016-09-21 12:09:08,259 IN

[jira] [Commented] (HBASE-17345) Implement batch

2016-12-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768749#comment-15768749
 ] 

Duo Zhang commented on HBASE-17345:
---

{quote}
We take startLogErrorsCnt as a param but ignore it?
{quote}
It is just for debugging... Will do some cleanup in the next patch.

{quote}
You make a new Action from passed-in Action because you don't want to modify 
passed-in params?
{quote}
The action passed in is a Row, i.e., Get, Put, Delete etc. And we will use it 
to construct an Action Object. It records the originalIndex of the action and 
also carries the nonce.


{quote}
super nit: you can presize the following 148this.action2Errors = new 
IdentityHashMap<>();
{quote}
If there is no error then the map will remain empty after we finish. I think 
this is the common case?

{quote}
This is just to log? 208long currentTime = System.currentTimeMillis(); 
i.e. all timing is with nanos but millis is just for logging?
{quote}
Just follow the old log pattern. It is use to construct the error message of 
RetriesExhaustedException. And I think it is reasonable as it is more friendly 
for the user to get a date(think of PrintGCTimeStamps VS. PrintGCDateStamps)

{quote}
What do you see AsyncBatchRpcRetryingCaller replacing in our current stack? It 
seems to do AP and a bunch of our Callable infra. Should 
AsyncBatchRpcRetryingCaller implement Callable? Or what you thinking?
{quote}
I plan to use it to replace AsyncProcess. And there is no callable in the 
current client implementation stack(or maybe some simple ones, see 
AsyncSingleRequestRpcRetryingCaller). With this patch, I think most retrying 
callers for async table are in place. The exceptions are read replica 
support(HBASE-17356), and endpoint support(HBASE-17346). And still need to 
improve the scan implementation(mvcc, inclusive/exclusive of start row and end 
row, etc.). But I think it is time to think about building the old blocking API 
on top of the new async API and get rid of the old code.

{quote}
Why we have AsyncTable and AsyncTableBase again? Do we have to have the two 
Interfaces?
{quote}
There were introduced when implementing scan. See the discussion in 
HBASE-16984. We can discuss later whether we can just have one AsyncTable 
interface.

{quote}
Do you have to rename TestAsyncGetMultiThread ? And/or TestAsyncTableMultiGet?
{quote}
No, it is get, not multi get... I will rename TestAsyncTableMultiGet and add 
other batch tests to it.

Thanks.

> Implement batch
> ---
>
> Key: HBASE-17345
> URL: https://issues.apache.org/jira/browse/HBASE-17345
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17345.patch
>
>
> Add the support for general batch based on the code introduced in HBASE-17142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond

2016-12-21 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-16981:
-
Status: Patch Available  (was: Open)

Hi [~jingcheng.du] and [~anoop.hbase], I posted v3 patch based on the new 
design. Can you help to review? Thanks!

> Expand Mob Compaction Partition policy from daily to weekly, monthly and 
> beyond
> ---
>
> Key: HBASE-16981
> URL: https://issues.apache.org/jira/browse/HBASE-16981
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-16981.master.001.patch, 
> HBASE-16981.master.002.patch, HBASE-16981.master.003.patch, 
> Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf
>
>
> Today the mob region holds all mob files for all regions. With daily 
> partition mob compaction policy, after major mob compaction, there is still 
> one file per region daily. Given there is 365 days in one year, at least 365 
> files per region. Since HDFS has limitation for number of files under one 
> folder, this is not going to scale if there are lots of regions. To reduce 
> mob file number,  we want to introduce other partition policies such as 
> weekly, monthly to compact mob files within one week or month into one file. 
> This jira is create to track this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond

2016-12-21 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-16981:
-
Attachment: HBASE-16981.master.003.patch

> Expand Mob Compaction Partition policy from daily to weekly, monthly and 
> beyond
> ---
>
> Key: HBASE-16981
> URL: https://issues.apache.org/jira/browse/HBASE-16981
> Project: HBase
>  Issue Type: New Feature
>  Components: mob
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-16981.master.001.patch, 
> HBASE-16981.master.002.patch, HBASE-16981.master.003.patch, 
> Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf
>
>
> Today the mob region holds all mob files for all regions. With daily 
> partition mob compaction policy, after major mob compaction, there is still 
> one file per region daily. Given there is 365 days in one year, at least 365 
> files per region. Since HDFS has limitation for number of files under one 
> folder, this is not going to scale if there are lots of regions. To reduce 
> mob file number,  we want to introduce other partition policies such as 
> weekly, monthly to compact mob files within one week or month into one file. 
> This jira is create to track this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

2016-12-21 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768703#comment-15768703
 ] 

Enis Soztutar commented on HBASE-17018:
---

Thanks for entertaining my suggestion. 
bq. On our case we launch ~1K containers per second. If we write 100 metrics 
each, the total volume written into HBase is considerable
HBase in the end will end up writing these events to it's own WAL's on HDFS. So 
in terms of scalability, you should be able to achieve HBase throughput much 
much easier, since HBase is doing a lot more work (RPC, sorting data, flushing 
to disk, compaction, etc). 
bq. For large deployments that means that there could be hundreds of parallel 
writers.
That should be fine for HDFS, as long as you have 1 writer per application, 
rather than 1 writer per task. 
bq. It would essentially double the hdfs requirement for the storage
I was thinking that you would delete the records once the reader has persisted 
them to HBase. If the application writer is dead, some other application writer 
eventually finishes the persisting to HBase (because WALs are already there in 
HDFS). For example, HBase keeps rolling the WAL to a new file every ~100MB's. 
Then the whole file is deleted once we determine that it is not needed anymore. 
bq.  Would the reader still query hbase only and return no data if hbase is 
missing the data?
I think that is determined by the requirements for ATS. You have to determine 
the "commit point" and the read point semantics. For example, you can have it 
so that commit point is the HDFS write. Once it is complete, you ACK the write 
which means HBase write will be "eventual consitent" with the benefit of not 
depending on HBase availability. Or you can make it so that, the commit point 
is HDFS write + wait for Hbase write for 30seconds. In this case, you wait for 
HBase for 30 sec, but still ACK the write once it hits HDFS after timeout. It 
also depends on whether you need read-after-write semantics or not. If so, 
maybe you do a in-memory cache for stuff waiting to be written to HBase. Not 
sure on ATS requirements.  





> Spooling BufferedMutator
> 
>
> Key: HBASE-17018
> URL: https://issues.apache.org/jira/browse/HBASE-17018
> Project: HBase
>  Issue Type: New Feature
>Reporter: Joep Rottinghuis
> Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/
> For design of SpoolingBufferedMutatorImpl see 
> https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768700#comment-15768700
 ] 

Hudson commented on HBASE-17341:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #84 (See 
[https://builds.apache.org/job/HBase-1.3-JDK8/84/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: 
rev 16583cd4f9a3219ce710d180447547b890268bf1)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17001) [RegionServer] Implement enforcement of quota violation policies

2016-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768687#comment-15768687
 ] 

Hadoop QA commented on HBASE-17001:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HBASE-17001 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844328/HBASE-17001.003.patch 
|
| JIRA Issue | HBASE-17001 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5016/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> [RegionServer] Implement enforcement of quota violation policies
> 
>
> Key: HBASE-17001
> URL: https://issues.apache.org/jira/browse/HBASE-17001
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-17001.001.patch, HBASE-17001.003.patch
>
>
> When the master enacts a quota violation policy, the RegionServers need to 
> actually enforce that policy per its definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768688#comment-15768688
 ] 

Hudson commented on HBASE-17341:


SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #76 (See 
[https://builds.apache.org/job/HBase-1.2-JDK8/76/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: 
rev 18dc7386bc9adff834db851a38306989fb3fd4a6)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17001) [RegionServer] Implement enforcement of quota violation policies

2016-12-21 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-17001:
---
Attachment: HBASE-17001.003.patch

.003 This was a brutal re-write.

Turns out to support the "proactive rejection" of bulk loads that would violate 
a quota, we need to start tracking the quota information much differently. We 
have to know what the current size of a table is and what it's allowed to be 
(the current quota limit).

There was a bit of cleanup along the way that was beneficial. Overall, a good 
exercise at least.

> [RegionServer] Implement enforcement of quota violation policies
> 
>
> Key: HBASE-17001
> URL: https://issues.apache.org/jira/browse/HBASE-17001
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-17001.001.patch, HBASE-17001.003.patch
>
>
> When the master enacts a quota violation policy, the RegionServers need to 
> actually enforce that policy per its definition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768562#comment-15768562
 ] 

Hudson commented on HBASE-17341:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1829 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1829/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: 
rev 1999c15a9adf774c39478d181accd6a15bdf29ff)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768515#comment-15768515
 ] 

Hudson commented on HBASE-17341:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #74 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/74/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: 
rev 16583cd4f9a3219ce710d180447547b890268bf1)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768487#comment-15768487
 ] 

Hudson commented on HBASE-17341:


SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1913 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1913/])
HBASE-17341 Add a timeout during replication endpoint termination (apurtell: 
rev 1999c15a9adf774c39478d181accd6a15bdf29ff)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

2016-12-21 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768468#comment-15768468
 ] 

Sangjin Lee commented on HBASE-17018:
-

Your suggestion is interesting [~enis]. Thanks for the idea.

In addition to what Joep mentioned above, I do worry about the capacity 
requirement a dual-writing system would have. It would essentially double the 
hdfs requirement for the storage, and at large scale it would add up to a 
meaningful amount.

Also, how would a reader work in the case where the data made it into hdfs but 
not into hbase (e.g. hbase cluster was down for a while for an upgrade)? Would 
the reader still query hbase only and return no data if hbase is missing the 
data? If we want to address that situation, we're putting back the unspooling 
(migrating missing data from the backup location to hbase). I'm just trying to 
round out the idea... Thanks!

> Spooling BufferedMutator
> 
>
> Key: HBASE-17018
> URL: https://issues.apache.org/jira/browse/HBASE-17018
> Project: HBase
>  Issue Type: New Feature
>Reporter: Joep Rottinghuis
> Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/
> For design of SpoolingBufferedMutatorImpl see 
> https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768453#comment-15768453
 ] 

Hudson commented on HBASE-17314:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2174 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2174/])
Revert "HBASE-17314 Limit total buffered size for all replication (stack: rev 
a1d2ff4646743a9136bb1182c0512bce28e358b7)
* (delete) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestGlobalThrottler.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java


> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768454#comment-15768454
 ] 

Hudson commented on HBASE-5401:
---

FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2174 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2174/])
HBASE-5401 PerformanceEvaluation generates 10x the number of expected (stack: 
rev d787155fd24c576b3220372dbb7286d5e291)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestPerformanceEvaluation.java


> PerformanceEvaluation generates 10x the number of expected mappers
> --
>
> Key: HBASE-5401
> URL: https://issues.apache.org/jira/browse/HBASE-5401
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Oliver Meyn
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBASE-5401-V1.patch
>
>
> With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation 
> randomWrite 10' there are 100 mappers spawned, rather than the expected 10.  
> The culprit appears to be the outer loop in writeInputFile which sets up 10 
> splits for every "asked-for client".  I think the fix is just to remove that 
> outer loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16010) Put draining function through Admin API

2016-12-21 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768204#comment-15768204
 ] 

Jerry He commented on HBASE-16010:
--

Hi, [~enis]

bq. I think we should have the ACL changes in this patch as well. Otherwise, it 
will get forgotten and leave a security hole
We surely need to get the ACL in.  But let's get this JIRA with the protobuf 
changes in first?  Mixing the ACL observers and the protobuf changes will 
probably bloat the patch and confusing.
Let me open a subtask right away and make sure it will be in.

Currently, the decommission works this way (i played with it recently.)
1. Put the server in drain mode.
2  Move the regions off with the region mover.

You think we should combine the two steps into one?

bq. "decommissioning" a server should be integral to the new assignment manager 
in the sense that the core assignment should be aware of decommissioning 
servers. 
I think currently if a server is in drain mode, 
serverManager/assignment/balancer will skip it as candidate servers.  But not 
sure much about the details.

> Put draining function through Admin API
> ---
>
> Key: HBASE-16010
> URL: https://issues.apache.org/jira/browse/HBASE-16010
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>Assignee: Matt Warhaftig
>Priority: Minor
> Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch
>
>
> Currently, there is no Amdin API for draining function. Client has to 
> interact directly with Zookeeper draining node to add and remove draining 
> servers.
> For example, in draining_servers.rb:
> {code}
>   zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, 
> "draining_servers", nil)
>   parentZnode = zkw.drainingZNode
>   begin
> for server in servers
>   node = ZKUtil.joinZNode(parentZnode, server)
>   ZKUtil.createAndFailSilent(zkw, node)
> end
>   ensure
> zkw.close()
>   end
> {code}
> This is not good in cases like secure clusters with protected Zookeeper nodes.
> Let's put draining function through Admin API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768275#comment-15768275
 ] 

Hudson commented on HBASE-17341:


FAILURE: Integrated in Jenkins build HBase-1.4 #576 (See 
[https://builds.apache.org/job/HBase-1.4/576/])
HBASE-17341 Add a timeout during replication endpoint termination (tedyu: rev 
f94180a3e9820761d59be98a62db9d218a096e5b)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-17357) PerformanceEvaluation parameters parsing triggers NPE.

2016-12-21 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-17357:
---

 Summary: PerformanceEvaluation parameters parsing triggers NPE.
 Key: HBASE-17357
 URL: https://issues.apache.org/jira/browse/HBASE-17357
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.2.4
Reporter: Jean-Marc Spaggiari
Priority: Minor


When using wrong parameters, PE triggers an NPE. Should not

{code}
@hbasetest1:~# hbase pe --nomapred
16/12/21 16:38:50 INFO Configuration.deprecation: hadoop.native.lib is 
deprecated. Instead, use io.native.lib.available
java.lang.NullPointerException
at java.util.TreeMap.getEntry(TreeMap.java:342)
at java.util.TreeMap.get(TreeMap.java:273)
at 
org.apache.hadoop.hbase.PerformanceEvaluation.determineCommandClass(PerformanceEvaluation.java:2145)
at 
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:2127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:2150)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16010) Put draining function through Admin API

2016-12-21 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768178#comment-15768178
 ] 

Jerry He commented on HBASE-16010:
--

bq. MasterRpcServices already has a ServerName import entry 
(org.apache.hadoop.hbase.ServerName) and they would conflict.
Use HBaseProtos.ServerName

> Put draining function through Admin API
> ---
>
> Key: HBASE-16010
> URL: https://issues.apache.org/jira/browse/HBASE-16010
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>Assignee: Matt Warhaftig
>Priority: Minor
> Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch
>
>
> Currently, there is no Amdin API for draining function. Client has to 
> interact directly with Zookeeper draining node to add and remove draining 
> servers.
> For example, in draining_servers.rb:
> {code}
>   zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, 
> "draining_servers", nil)
>   parentZnode = zkw.drainingZNode
>   begin
> for server in servers
>   node = ZKUtil.joinZNode(parentZnode, server)
>   ZKUtil.createAndFailSilent(zkw, node)
> end
>   ensure
> zkw.close()
>   end
> {code}
> This is not good in cases like secure clusters with protected Zookeeper nodes.
> Let's put draining function through Admin API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers

2016-12-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5401:
-
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

Marked it incompatible change.

> PerformanceEvaluation generates 10x the number of expected mappers
> --
>
> Key: HBASE-5401
> URL: https://issues.apache.org/jira/browse/HBASE-5401
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Oliver Meyn
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBASE-5401-V1.patch
>
>
> With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation 
> randomWrite 10' there are 100 mappers spawned, rather than the expected 10.  
> The culprit appears to be the outer loop in writeInputFile which sets up 10 
> splits for every "asked-for client".  I think the fix is just to remove that 
> outer loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers

2016-12-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5401:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: Changes how many tasks PE runs when clients are mapreduce. 
Now tasks == client count. Previous we hardcoded ten tasks per client instance.
  Status: Resolved  (was: Patch Available)

Pushed. Makes sense. This baffled you and Oliver. Thats enough. Thanks for the 
patch [~easyliangjob]

> PerformanceEvaluation generates 10x the number of expected mappers
> --
>
> Key: HBASE-5401
> URL: https://issues.apache.org/jira/browse/HBASE-5401
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Oliver Meyn
>Assignee: Yi Liang
> Fix For: 2.0.0
>
> Attachments: HBASE-5401-V1.patch
>
>
> With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation 
> randomWrite 10' there are 100 mappers spawned, rather than the expected 10.  
> The culprit appears to be the outer loop in writeInputFile which sets up 10 
> splits for every "asked-for client".  I think the fix is just to remove that 
> outer loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17341:
---
   Resolution: Fixed
Fix Version/s: 0.98.24
   1.1.8
   1.2.5
   1.3.0
   Status: Resolved  (was: Patch Available)

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 1.1.8, 0.98.24
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767960#comment-15767960
 ] 

Hudson commented on HBASE-17341:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2173 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2173/])
HBASE-17341 Add a timeout during replication endpoint termination (tedyu: rev 
cac0904c16dde9eb7bdbb57e4a33224dd4edb77f)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17352) Fix hbase-assembly build with bash 4

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767961#comment-15767961
 ] 

Hudson commented on HBASE-17352:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2173 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2173/])
HBASE-17352 Fix hbase-assembly build with bash 4 (Junegunn Choi) (tedyu: rev 
acd0218d91bac9410f7b9bc68f66aa065fd47d55)
* (edit) hbase-assembly/pom.xml


> Fix hbase-assembly build with bash 4
> 
>
> Key: HBASE-17352
> URL: https://issues.apache.org/jira/browse/HBASE-17352
> Project: HBase
>  Issue Type: Bug
>Reporter: Junegunn Choi
>Assignee: Junegunn Choi
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17352.patch
>
>
> hbase-assembly fails to build with bash 4.
> {noformat}
> [DEBUG] Executing command line: [env, bash, -c, cat 
> maven-shared-archive-resources/META-INF/NOTICE \
>   `find 
> /Users/jg/github/hbase/hbase-assembly/target/dependency -iname NOTICE -or 
> -iname NOTICE.txt` \]
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec 
> (concat-NOTICE-files) on project hbase-assembly: Command execution failed. 
> Process exited with an error: 1 (Exit value: 1) -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (concat-NOTICE-files) on 
> project hbase-assembly: Command execution failed.
> {noformat}
> The error is caused by the trailing backslash in the bash command for 
> {{concat-NOTICE-files}}. You can see the behavioral difference between bash 3 
> and 4 with the following snippet.
> {code}
> $ # Using bash 3
> $ /bin/bash -c 'cat <(echo foo) \' && echo good || echo bad
> foo
> good
> $ # Using bash 4
> $ /usr/local/bin/bash -c 'cat <(echo foo) \' && echo good || echo bad
> foo
> cat: \: No such file or directory
> bad
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17355) Create a simplifed version of flush scanner

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767948#comment-15767948
 ] 

stack commented on HBASE-17355:
---

Nice experiment. What diff you see [~ram_krish]? (How you reading the 
profiling?)

> Create a simplifed version of flush scanner
> ---
>
> Key: HBASE-17355
> URL: https://issues.apache.org/jira/browse/HBASE-17355
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-17354.patch, after patch.png, before patch.png
>
>
> Currently we use StoreScanner for performing the flushes which actuallly goes 
> row by row. Probably that is not needed and we could always go ahead with a 
> simple loop in collecting the cells and writing them to the file. Inside 
> write path we have the required sanity check so it is not needed that the 
> store scanner does a sanity check. 
> Also the limit that could be retrieved in one next() call could be equivalent 
> to the block size configured as we do for compaction.
> Are there any filters that we want to do (i mean any version check or 
> deletion) that we need to check in flush? If so then this simplified version 
> will not work. I may be missing something but if so we need to see what are 
> those and add it here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767923#comment-15767923
 ] 

stack commented on HBASE-17338:
---

Do we have to check if on or offheap MSLAB?

86/**
87 * @return Whether off heap based MSLAB in place.
88 */
89boolean isOffheap();

Can we not have MSLAB work same whether on or offheap?

This sort of check...

101 // issues or even OOME.
102 if (this.memStoreLAB != null && 
this.memStoreLAB.isOffheap()) {
103   heapOverheadDelta += cellLen;
104 }


... presumes that MSLAB is done in either of two ways. This check is done apart 
from the implementation.

Is there copy/paste of code (going by your dup'ing the comment?).

I need to read on why Append/Increment can't be out in offheap.

This is good stuff though [~anoop.hbase]



> Treat Cell data size under global memstore heap size only when that Cell can 
> not be copied to MSLAB
> ---
>
> Key: HBASE-17338
> URL: https://issues.apache.org/jira/browse/HBASE-17338
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
> Fix For: 2.0.0
>
> Attachments: HBASE-17338.patch
>
>
> We have only data size and heap overhead being tracked globally.  Off heap 
> memstore works with off heap backed MSLAB pool.  But a cell, when added to 
> memstore, not always getting copied to MSLAB.  Append/Increment ops doing an 
> upsert, dont use MSLAB.  Also based on the Cell size, we sometimes avoid 
> MSLAB copy.  But now we track these cell data size also under the global 
> memstore data size which indicated off heap size in case of off heap 
> memstore.  For global checks for flushes (against lower/upper watermark 
> levels), we check this size against max off heap memstore size.  We do check 
> heap overhead against global heap memstore size (Defaults to 40% of xmx)  But 
> for such cells the data size also should be accounted under the heap overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16010) Put draining function through Admin API

2016-12-21 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767918#comment-15767918
 ] 

Enis Soztutar commented on HBASE-16010:
---

I think we should have the ACL changes in this patch as well. Otherwise, it 
will get forgotten and leave a security hole. 

Instead of draining, we should use the term decommission/recommission I think. 
And GetDrainingServers should be ListDrainingServers, or 
ListDecommissionedServers. 

This is obviously already broken, but the new API right now only puts the 
server in "draining" mode and does not do anything else. Is there a plan to 
bring the actual functionality (of moving regions out of the RS) in the master 
as well? As I have noted elsewhere, "decommissioning" a server should be 
integral to the new assignment manager in the sense that the core assignment 
should be aware of decommissioning servers. [~stack], [~syuanjiang] what do you 
guys think? Is the current stuff have ways to address that? 

> Put draining function through Admin API
> ---
>
> Key: HBASE-16010
> URL: https://issues.apache.org/jira/browse/HBASE-16010
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>Assignee: Matt Warhaftig
>Priority: Minor
> Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch
>
>
> Currently, there is no Amdin API for draining function. Client has to 
> interact directly with Zookeeper draining node to add and remove draining 
> servers.
> For example, in draining_servers.rb:
> {code}
>   zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, 
> "draining_servers", nil)
>   parentZnode = zkw.drainingZNode
>   begin
> for server in servers
>   node = ZKUtil.joinZNode(parentZnode, server)
>   ZKUtil.createAndFailSilent(zkw, node)
> end
>   ensure
> zkw.close()
>   end
> {code}
> This is not good in cases like secure clusters with protected Zookeeper nodes.
> Let's put draining function through Admin API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767906#comment-15767906
 ] 

Andrew Purtell commented on HBASE-17341:


This is flagged as a critical fix and we hit it in production, so I'm going to 
commit everywhere.

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767905#comment-15767905
 ] 

stack commented on HBASE-17314:
---

Reverted. Reopened.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-17314:
---

Let me revert. The failing test is messing up other devs. Can reapply w/ 
addendum no problem.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767877#comment-15767877
 ] 

stack commented on HBASE-17341:
---

Thanks for clarification [~vincentpoon]. If no dataloss and we retry, WARN 
seems fine by me.

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767791#comment-15767791
 ] 

Vincent Poon commented on HBASE-17341:
--

[~stack] I don't believe we'll lose data if you timeout even during shipping.  
source#shipEdits() doesn't remove entries from the queue until 
endpoint#replicate() returns success.  So at worst, you ship the data more than 
once.

Looking at it now I suppose ERROR would make sense, though WARN is no worse 
than what was there before.

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17334) Add locate row before/after support for AsyncRegionLocator

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767773#comment-15767773
 ] 

stack commented on HBASE-17334:
---

Skimmed. LGTM. I like the enum for what type of location.

Below are nits that can be addressed on commit or in next patch... 

It just gets a bit confusing when you add a shortcut for 'before'

  this.before = before;

... rather than do test of the enum if == BEFORE.

nit: do a switch in stead of if/else:

if (type == RegionLocateType.BEFORE) {

RegionLocateType is a nice improvement.

> Add locate row before/after support for AsyncRegionLocator
> --
>
> Key: HBASE-17334
> URL: https://issues.apache.org/jira/browse/HBASE-17334
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17334-v1.patch, HBASE-17334.patch
>
>
> Now we only have a getPreviousRegionLocation method which is only used for 
> reverse scan, and it is not perfect as it can not deal with region merge. As 
> we want to add inclusive/exclusive support for start row and end row of a 
> scan, we need to implement general locate to row before/after method for 
> AsyncRegionLocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15130) Backport 0.98 Scan different TimeRange for each column family

2016-12-21 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-15130:
---
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

> Backport 0.98 Scan different TimeRange for each column family 
> --
>
> Key: HBASE-15130
> URL: https://issues.apache.org/jira/browse/HBASE-15130
> Project: HBase
>  Issue Type: Bug
>  Components: Client, regionserver, Scanners
>Affects Versions: 0.98.17
>Reporter: churro morales
>Assignee: churro morales
> Fix For: 0.98.24
>
> Attachments: HBASE-15130-0.98.patch, HBASE-15130-0.98.v1.patch, 
> HBASE-15130-0.98.v1.patch, HBASE-15130-0.98.v2.patch, 
> HBASE-15130-0.98.v3.patch, HBASE-15130-0.98.v4.patch
>
>
> branch 98 version backport for HBASE-14355



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767757#comment-15767757
 ] 

Andrew Purtell commented on HBASE-17341:


We can do an addendum if warranted. 

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767751#comment-15767751
 ] 

Andrew Purtell commented on HBASE-17069:


Alright, let me run head of branch-1.2 and see if it repros with TRACE level 
logging. 

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960

[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767745#comment-15767745
 ] 

stack commented on HBASE-17069:
---

Either. Need to dig in on this issue. Need to make a start somewhere. I don't 
mind doing the digging if you are doing the running of the test. [~apurtell]


> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=128

[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767740#comment-15767740
 ] 

Andrew Purtell commented on HBASE-17069:


bq. They are for 1.3 run?
You want all logs from a 1.3 run? I can redo
Or 1.2. 
Made to order. 


> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12

[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767736#comment-15767736
 ] 

stack commented on HBASE-17341:
---

[~vincentpoon] See above.

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767734#comment-15767734
 ] 

Andrew Purtell commented on HBASE-17069:


bq.  I've not magic other than logging and asserts
I can run binaries for you.

bq. How long to repro?
Usually fails within a few hours, sometimes needs an overnight. 

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, free

[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767730#comment-15767730
 ] 

stack commented on HBASE-17069:
---

You mean logs from november 10th [~apurtell]? They are for 1.3 run? I've not 
magic other than logging and assertsunfortunately. How long to repro?

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712

[jira] [Comment Edited] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767723#comment-15767723
 ] 

Andrew Purtell edited comment on HBASE-17341 at 12/21/16 6:13 PM:
--

Since Ted committed this I will pick to 0.98 now.

I missed it if there was an announcement that branch-1.3 is closed. I committed 
another of Vincent's replication fixes there yesterday. We should probably 
commit this one too now that the deed has been done. 


was (Author: apurtell):
Since Ted committed this I will pick to 0.98 now and resolve. 

I missed it if there was an announcement that branch-1.3 is closed. I committed 
another of Vincent's replication fixes there yesterday. We should probably 
commit this one too now that the deed has been done. 

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767723#comment-15767723
 ] 

Andrew Purtell commented on HBASE-17341:


Since Ted committed this I will pick to 0.98 now and resolve. 

I missed it if there was an announcement that branch-1.3 is closed. I committed 
another of Vincent's replication fixes there yesterday. We should probably 
commit this one too now that the deed has been done. 

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767711#comment-15767711
 ] 

Andrew Purtell commented on HBASE-17069:


I attached logs from one failed run with 1.2 on this issue. They indicate the 
problem but not the cause. (Or maybe I missed it.) I plan to look over the 
HBASE-14465 diff closely, and read the affected code in place, and probably 
introduce more logging temporarily in suspect places. Other suggestions?

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddf

[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767699#comment-15767699
 ] 

stack commented on HBASE-17069:
---

[~apurtell] How we debug? Logs to look at or something?

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFa

[jira] [Comment Edited] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767531#comment-15767531
 ] 

Andrew Purtell edited comment on HBASE-17069 at 12/21/16 5:59 PM:
--

FWIW I tested the head of branch-1.3 in my rig and it failed the same way, "no 
serialized HRegionInfo" in some rows in meta, with resulting job failure as 
part of the keyspace went missing. 
[~mantonov] [~ghelmling]


was (Author: apurtell):
FWIW I tested the head of branch-1.3 in my rig and it failed the same way, "no 
serialized HRegionInfo" in some rows in meta, with resulting job failure as 
part of the keyspace went missing. 

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWri

[jira] [Commented] (HBASE-17345) Implement batch

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767632#comment-15767632
 ] 

stack commented on HBASE-17345:
---

{code}

72   * And the {@link #maxAttempts} is a limit for each single operation in 
the batch logically. In the
73   * implementation, we will record a {@code tries} parameter for each 
operation group, and if it is
74   * split to several groups when retrying, the sub groups will inherit 
the {@code tries}. You can
75   * imagine that the whole retrying process is a tree, and the {@link 
#maxAttempts} is the limit of
76   * the depth of the tree.
77   */
{code}

Trying to understand, the tree will only have a depth of one; i.e. a branch for 
each regionserver the batch is going against? Each branch can run up its own 
maxAttempts?  The tries is not shared amongst the branches? Regardless of how 
many retries, the operation will stop after operationTimeoutNs? If so, sounds 
good.

It has to be an Impl in the below, it can't be Interface?

  private final AsyncConnectionImpl conn;

What is up w/ below?

this.startLogErrorsCnt = 0;// startLogErrorsCnt;

We take startLogErrorsCnt as a param but ignore it?

You make a new Action from passed-in Action because you don't want to modify 
passed-in params?

139   Action action = new Action(rawAction, i);

super nit: you can presize the following148 this.action2Errors 
= new IdentityHashMap<>();

Perhaps if TRACE-level logging, log every attempt: 164  if (tries > 
startLogErrorsCnt) { ?

Is it right to set this to WARN since it might succeed on next attempt? 

LOG.warn("Process batch for "

... maybe I'm reading it wrong though?

nit: give this method a better name:

174   private String getExtras(ServerName serverName) {
175 return serverName != null ? serverName.getServerName() : "";
176   }

YOu should use the above method here?

4   serverName != null ? serverName.toString() : ""));

This is just to log? 208long currentTime = 
System.currentTimeMillis();   i.e. all timing is with nanos but millis is just 
for logging?

This is a crazy amount of work! I like how this patch is getting better on each 
iteration; i.e.   public MultiGetCallerBuilder multiGet() { becomes   
public BatchCallerBuilder batch() {

Skimmed after reading 1/4.

What do you see AsyncBatchRpcRetryingCaller replacing in our current stack? It 
seems to do AP and a bunch of our Callable infra. Should 
AsyncBatchRpcRetryingCaller  implement Callable? Or what you thinking?

Generally no * imports

1   import static org.apache.hadoop.hbase.client.ConnectionUtils.*;

Why we have AsyncTable and AsyncTableBase again? Do we have to have the two 
Interfaces?

Do you have to rename TestAsyncGetMultiThread ?  And/or TestAsyncTableMultiGet?

This is nice work




> Implement batch
> ---
>
> Key: HBASE-17345
> URL: https://issues.apache.org/jira/browse/HBASE-17345
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17345.patch
>
>
> Add the support for general batch based on the code introduced in HBASE-17142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17250) For Get and scan in one case, checkFamily can be skipped in Region#getScanner

2016-12-21 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-17250:
-
Status: Patch Available  (was: Open)

> For Get and scan in one case, checkFamily can be skipped in Region#getScanner
> -
>
> Key: HBASE-17250
> URL: https://issues.apache.org/jira/browse/HBASE-17250
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-17250-master-001.patch, 
> HBASE-17250-master-002.patch
>
>
> For get(), checkFamily is done in prepareGet(), so checkFamily can be skipped 
> in Region#getScanner(). For scan(), if there is no Family configured in scan, 
> the families are from table descriptor, so checkFamily in 
> Region#getScanner(). can be skipped in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17250) For Get and scan in one case, checkFamily can be skipped in Region#getScanner

2016-12-21 Thread huaxiang sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huaxiang sun updated HBASE-17250:
-
Attachment: HBASE-17250-master-002.patch

Submit a patch. One issue with coprocessor is that it could change families in 
scan, to be safe, after coprocessor, recheck families in getScanner().

> For Get and scan in one case, checkFamily can be skipped in Region#getScanner
> -
>
> Key: HBASE-17250
> URL: https://issues.apache.org/jira/browse/HBASE-17250
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: huaxiang sun
>Assignee: huaxiang sun
>Priority: Minor
> Attachments: HBASE-17250-master-001.patch, 
> HBASE-17250-master-002.patch
>
>
> For get(), checkFamily is done in prepareGet(), so checkFamily can be skipped 
> in Region#getScanner(). For scan(), if there is no Family configured in scan, 
> the families are from table descriptor, so checkFamily in 
> Region#getScanner(). can be skipped in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767606#comment-15767606
 ] 

Hudson commented on HBASE-17328:


FAILURE: Integrated in Jenkins build HBase-0.98-on-Hadoop-1.1 #1299 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1299/])
HBASE-17328 Properly dispose of looped replication peers (apurtell: rev 
5ea953b0115ac814f67e4fb076b2fdce85dd22cf)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java


> Properly dispose of looped replication peers
> 
>
> Key: HBASE-17328
> URL: https://issues.apache.org/jira/browse/HBASE-17328
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 0.98.23
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9
>
> Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, 
> HBASE-17328-master.v2.patch, HBASE-17328.0.98.v4.patch, 
> HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch, 
> HBASE-17328.branch-1.1.v4.patch, HBASE-17328.master.v4.patch
>
>
> When adding a looped replication peer (clusterId == peerClusterId), the 
> following code terminates the replication source thread, but since the source 
> manager still holds a reference, WALs continue to get enqueued, and never get 
> cleaned because they're stuck in the queue, leading to an unsustainable 
> buildup.  Furthermore, the replication statistics thread will continue to 
> print statistics for the terminated source.
> {code}
> if (clusterId.equals(peerClusterId) && 
> !replicationEndpoint.canReplicateToSameCluster()) {
>   this.terminate("ClusterId " + clusterId + " is replicating to itself: 
> peerClusterId "
>   + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>   + replicationEndpoint.getClass().getName(), null, false);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16008) A robust way deal with early termination of HBCK

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767607#comment-15767607
 ] 

Hudson commented on HBASE-16008:


FAILURE: Integrated in Jenkins build HBase-0.98-on-Hadoop-1.1 #1299 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1299/])
HBASE-16008 A robust way deal with early termination of HBCK (Stephen 
(apurtell: rev f63b5a0db9e630af69654fca59cf7ab3f724245f)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* (edit) 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/SnapshotProtos.java
* (edit) hbase-protocol/src/main/protobuf/Master.proto
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterMaintenanceModeTracker.java
* (edit) 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java


> A robust way deal with early termination of HBCK
> 
>
> Key: HBASE-16008
> URL: https://issues.apache.org/jira/browse/HBASE-16008
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.4.0, 0.98.24
>
> Attachments: HBASE-16008-0.98.patch, HBASE-16008.v0-master.patch, 
> HBASE-16008.v1-branch-1.patch, HBASE-16008.v1-master.patch
>
>
> When HBCK is running, we want to disable Catalog Janitor, Balancer and 
> Split/Merge.  Today, the implementation is not robust.  If HBCK is terminated 
> earlier by Control-C, the changed state would not be reset to original.  
> HBASE-15406 was trying to solve this problem for Split/Merge switch.  The 
> implementation is complicated, and it did not solve CJ and Balancer.  
> The proposal to solve the problem is to use a znode to indicate that the HBCK 
> is running.  CJ, balancer, and Split/Merge switch all look for this znode 
> before doing it operation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16010) Put draining function through Admin API

2016-12-21 Thread Matt Warhaftig (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767600#comment-15767600
 ] 

Matt Warhaftig commented on HBASE-16010:


Thanks Jerry, Feel free to fix the first issue and commit.

As for the second issue, MasterRpcServices uses the fully qualified classname 
of {{org.apache.hadoop.hbase.shaded.protobuf.generated.HBaseProtos.ServerName}} 
because MasterRpcServices already has a {{ServerName}} import entry 
({{org.apache.hadoop.hbase.ServerName}}) and they would conflict.

> Put draining function through Admin API
> ---
>
> Key: HBASE-16010
> URL: https://issues.apache.org/jira/browse/HBASE-16010
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jerry He
>Assignee: Matt Warhaftig
>Priority: Minor
> Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch
>
>
> Currently, there is no Amdin API for draining function. Client has to 
> interact directly with Zookeeper draining node to add and remove draining 
> servers.
> For example, in draining_servers.rb:
> {code}
>   zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, 
> "draining_servers", nil)
>   parentZnode = zkw.drainingZNode
>   begin
> for server in servers
>   node = ZKUtil.joinZNode(parentZnode, server)
>   ZKUtil.createAndFailSilent(zkw, node)
> end
>   ensure
> zkw.close()
>   end
> {code}
> This is not good in cases like secure clusters with protected Zookeeper nodes.
> Let's put draining function through Admin API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767562#comment-15767562
 ] 

Hudson commented on HBASE-17328:


FAILURE: Integrated in Jenkins build HBase-0.98-matrix #428 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/428/])
HBASE-17328 Properly dispose of looped replication peers (apurtell: rev 
5ea953b0115ac814f67e4fb076b2fdce85dd22cf)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java


> Properly dispose of looped replication peers
> 
>
> Key: HBASE-17328
> URL: https://issues.apache.org/jira/browse/HBASE-17328
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 0.98.23
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9
>
> Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, 
> HBASE-17328-master.v2.patch, HBASE-17328.0.98.v4.patch, 
> HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch, 
> HBASE-17328.branch-1.1.v4.patch, HBASE-17328.master.v4.patch
>
>
> When adding a looped replication peer (clusterId == peerClusterId), the 
> following code terminates the replication source thread, but since the source 
> manager still holds a reference, WALs continue to get enqueued, and never get 
> cleaned because they're stuck in the queue, leading to an unsustainable 
> buildup.  Furthermore, the replication statistics thread will continue to 
> print statistics for the terminated source.
> {code}
> if (clusterId.equals(peerClusterId) && 
> !replicationEndpoint.canReplicateToSameCluster()) {
>   this.terminate("ClusterId " + clusterId + " is replicating to itself: 
> peerClusterId "
>   + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>   + replicationEndpoint.getClass().getName(), null, false);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16008) A robust way deal with early termination of HBCK

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767563#comment-15767563
 ] 

Hudson commented on HBASE-16008:


FAILURE: Integrated in Jenkins build HBase-0.98-matrix #428 (See 
[https://builds.apache.org/job/HBase-0.98-matrix/428/])
HBASE-16008 A robust way deal with early termination of HBCK (Stephen 
(apurtell: rev f63b5a0db9e630af69654fca59cf7ab3f724245f)
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
* (edit) 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterMaintenanceModeTracker.java
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) hbase-protocol/src/main/protobuf/Master.proto
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* (edit) 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/SnapshotProtos.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


> A robust way deal with early termination of HBCK
> 
>
> Key: HBASE-16008
> URL: https://issues.apache.org/jira/browse/HBASE-16008
> Project: HBase
>  Issue Type: Improvement
>  Components: hbck
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0, 1.4.0, 0.98.24
>
> Attachments: HBASE-16008-0.98.patch, HBASE-16008.v0-master.patch, 
> HBASE-16008.v1-branch-1.patch, HBASE-16008.v1-master.patch
>
>
> When HBCK is running, we want to disable Catalog Janitor, Balancer and 
> Split/Merge.  Today, the implementation is not robust.  If HBCK is terminated 
> earlier by Control-C, the changed state would not be reset to original.  
> HBASE-15406 was trying to solve this problem for Split/Merge switch.  The 
> implementation is complicated, and it did not solve CJ and Balancer.  
> The proposal to solve the problem is to use a znode to indicate that the HBCK 
> is running.  CJ, balancer, and Split/Merge switch all look for this znode 
> before doing it operation.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances

2016-12-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767531#comment-15767531
 ] 

Andrew Purtell commented on HBASE-17069:


FWIW I tested the head of branch-1.3 in my rig and it failed the same way, "no 
serialized HRegionInfo" in some rows in meta, with resulting job failure as 
part of the keyspace went missing. 

> RegionServer writes invalid META entries for split daughters in some 
> circumstances
> --
>
> Key: HBASE-17069
> URL: https://issues.apache.org/jira/browse/HBASE-17069
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.4
>Reporter: Andrew Purtell
>Priority: Critical
> Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, 
> daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, 
> parent-393d2bfd8b1c52ce08540306659624f2.log
>
>
> I have been seeing frequent ITBLL failures testing various versions of 1.2.x. 
> Over the lifetime of 1.2.x the following issues have been fixed:
> - HBASE-15315 (Remove always set super user call as high priority)
> - HBASE-16093 (Fix splits failed before creating daughter regions leave meta 
> inconsistent)
> And this one is pending:
> - HBASE-17044 (Fix merge failed before creating merged region leaves meta 
> inconsistent)
> I can apply all of the above to branch-1.2 and still see this failure: 
> *The life of stillborn region d55ef81c2f8299abbddfce0445067830*
> *Master sees SPLITTING_NEW*
> {noformat}
> 2016-11-08 04:23:21,186 INFO  [AM.ZK.Worker-pool2-t82] master.RegionStates: 
> Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, 
> ts=1478579001186, server=node-3.cluster,16020,1478578389506}
> {noformat}
> *The RegionServer creates it*
> {noformat}
> 2016-11-08 04:23:26,035 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,038 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for big: blockCache=LruBlockCache{blockCount=34, 
> currentSize=14996112, freeSize=12823716208, maxSize=12838712320, 
> heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,442 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, 
> currentSize=17187656, freeSize=12821524664, maxSize=12838712320, 
> heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,713 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,715 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, 
> currentSize=19178440, freeSize=12819533880, maxSize=12838712320, 
> heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, 
> multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false
> 2016-11-08 04:23:26,717 INFO  
> [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created 
> cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, 
> c

[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767514#comment-15767514
 ] 

stack commented on HBASE-17341:
---

If we timeout, is it a WARN or an ERROR?  Do we lose data if we timeout just 
keep processing? Thanks. Good find.

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17352) Fix hbase-assembly build with bash 4

2016-12-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17352:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the patch, Junegunn

> Fix hbase-assembly build with bash 4
> 
>
> Key: HBASE-17352
> URL: https://issues.apache.org/jira/browse/HBASE-17352
> Project: HBase
>  Issue Type: Bug
>Reporter: Junegunn Choi
>Assignee: Junegunn Choi
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17352.patch
>
>
> hbase-assembly fails to build with bash 4.
> {noformat}
> [DEBUG] Executing command line: [env, bash, -c, cat 
> maven-shared-archive-resources/META-INF/NOTICE \
>   `find 
> /Users/jg/github/hbase/hbase-assembly/target/dependency -iname NOTICE -or 
> -iname NOTICE.txt` \]
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec 
> (concat-NOTICE-files) on project hbase-assembly: Command execution failed. 
> Process exited with an error: 1 (Exit value: 1) -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (concat-NOTICE-files) on 
> project hbase-assembly: Command execution failed.
> {noformat}
> The error is caused by the trailing backslash in the bash command for 
> {{concat-NOTICE-files}}. You can see the behavioral difference between bash 3 
> and 4 with the following snippet.
> {code}
> $ # Using bash 3
> $ /bin/bash -c 'cat <(echo foo) \' && echo good || echo bad
> foo
> good
> $ # Using bash 4
> $ /usr/local/bin/bash -c 'cat <(echo foo) \' && echo good || echo bad
> foo
> cat: \: No such file or directory
> bad
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767488#comment-15767488
 ] 

stack commented on HBASE-17314:
---

[~yangzhe1991] If making an addendum for hanging test, here is some other input.

HConstants is for defines that  are used in many places. The preference is to 
keep defines beside the code where they are used so move these to 
ReplicationSource

935   public static final String REPLICATION_SOURCE_TOTAL_BUFFER_KEY = 
"replication.total.buffer.quota";
936   public static final int REPLICATION_SOURCE_TOTAL_BUFFER_DFAULT = 256 
* 1024 * 1024;

Suggest too that the explanation for why 256M that you give above be written as 
a comment on the above define.

This looks like it could be package private rather than public:

2344  public ReplicationSourceService getReplicationSourceService() {

Otherwise patch looks good.  Pity that replication is so hard to test. Any 
ideas on how to make it easier? Test could be hanging for any of many reasons 
given you have to put up two clusters inside one jvm.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17341) Add a timeout during replication endpoint termination

2016-12-21 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-17341:
---
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.0
   2.0.0

Integrated to branch-1 and master.

Waiting for branch-1.3 to open.

> Add a timeout during replication endpoint termination
> -
>
> Key: HBASE-17341
> URL: https://issues.apache.org/jira/browse/HBASE-17341
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17341.branch-1.1.v1.patch, 
> HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, 
> HBASE-17341.master.v2.patch
>
>
> In ReplicationSource#terminate(), a Future is obtained from 
> ReplicationEndpoint#stop().  Future.get() is then called, but can potentially 
> hang there if something went wrong in the endpoint stop().
> Hanging there has serious implications, because the thread could potentially 
> be the ZK event thread (e.g. watcher calls 
> ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> 
> blocked).  This means no other events in the ZK event queue will get 
> processed, which for HBase means other ZK watches such as replication watch 
> notifications, snapshot watch notifications, even RegionServer shutdown will 
> all get blocked.
> The short term fix addressed here is to simply add a timeout for 
> Future.get().  But the severe consequences seen here perhaps suggest a 
> broader refactoring of the ZKWatcher usage in HBase is in order, to protect 
> against situations like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources

2016-12-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767290#comment-15767290
 ] 

Ted Yu commented on HBASE-17314:


TestGlobalThrottler hangs in master build.
Please investigate.

> Limit total buffered size for all replication sources
> -
>
> Key: HBASE-17314
> URL: https://issues.apache.org/jira/browse/HBASE-17314
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, 
> HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch
>
>
> If we have many peers or some servers have many recovered queues, we will 
> hold many entries in memory which will increase the pressure of GC, even 
> maybe OOM because we will read entries for 64MB to buffer in default for one 
> source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17345) Implement batch

2016-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767185#comment-15767185
 ] 

Hadoop QA commented on HBASE-17345:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
0s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
51s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
26m 42s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s 
{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 55s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 141m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.replication.regionserver.TestGlobalThrottler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844236/HBASE-17345.patch |
| JIRA Issue | HBASE-17345 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux 1a23f33321b5 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / e1f4aae |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5014/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/5014/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
htt

[jira] [Commented] (HBASE-17334) Add locate row before/after support for AsyncRegionLocator

2016-12-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767119#comment-15767119
 ] 

Yu Li commented on HBASE-17334:
---

+1 on v1 patch, only a trivial question on RB, thanks.

> Add locate row before/after support for AsyncRegionLocator
> --
>
> Key: HBASE-17334
> URL: https://issues.apache.org/jira/browse/HBASE-17334
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17334-v1.patch, HBASE-17334.patch
>
>
> Now we only have a getPreviousRegionLocation method which is only used for 
> reverse scan, and it is not perfect as it can not deal with region merge. As 
> we want to add inclusive/exclusive support for start row and end row of a 
> scan, we need to implement general locate to row before/after method for 
> AsyncRegionLocator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17345) Implement batch

2016-12-21 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767026#comment-15767026
 ] 

Duo Zhang commented on HBASE-17345:
---

Seems the reviewboard is broken... I can not upload the patch here, it says 
that I need to use --full-index. But if I upload the patch generated with 
--full-index, it tells me 'The specified diff file could not be parsed.'. No 
idea...

> Implement batch
> ---
>
> Key: HBASE-17345
> URL: https://issues.apache.org/jira/browse/HBASE-17345
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17345.patch
>
>
> Add the support for general batch based on the code introduced in HBASE-17142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable

2016-12-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766913#comment-15766913
 ] 

Hadoop QA commented on HBASE-17262:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
50s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
29s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
40s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
25m 32s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 20s {color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hbase-it in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 132m 42s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.replication.regionserver.TestGlobalThrottler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12844217/HBASE-17262.master.V4.patch
 |
| JIRA Issue | HBASE-17262 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux b62bf154bd94 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / e1f4aae |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/5013/artifact/patchprocess/patch-unit-hbase-server.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HBASE-Build/5013/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Resu

[jira] [Created] (HBASE-17356) Add replica read support

2016-12-21 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-17356:
-

 Summary: Add replica read support
 Key: HBASE-17356
 URL: https://issues.apache.org/jira/browse/HBASE-17356
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


I think we can do better for scan at least as now we will pass the mvcc to 
client. We can use the mvcc to determine if we can get a consistent view when 
reading from replicas other than primary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17345) Implement batch

2016-12-21 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17345:
--
Attachment: HBASE-17345.patch

A first version. Implement a general batch method and all other multi get, 
multi put and multi delete depend on it.

Will add more comments and tests in the next patch.

> Implement batch
> ---
>
> Key: HBASE-17345
> URL: https://issues.apache.org/jira/browse/HBASE-17345
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17345.patch
>
>
> Add the support for general batch based on the code introduced in HBASE-17142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17345) Implement batch

2016-12-21 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-17345:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Implement batch
> ---
>
> Key: HBASE-17345
> URL: https://issues.apache.org/jira/browse/HBASE-17345
> Project: HBase
>  Issue Type: Sub-task
>  Components: asyncclient, Client
>Affects Versions: 2.0.0
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-17345.patch
>
>
> Add the support for general batch based on the code introduced in HBASE-17142.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766854#comment-15766854
 ] 

Hudson commented on HBASE-17328:


SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #73 (See 
[https://builds.apache.org/job/HBase-1.3-JDK7/73/])
HBASE-17328 Properly dispose of looped replication peers (apurtell: rev 
7b3187c1a02eb875b2ba2daa49d43738f4dce8f8)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java


> Properly dispose of looped replication peers
> 
>
> Key: HBASE-17328
> URL: https://issues.apache.org/jira/browse/HBASE-17328
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0, 0.98.23
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Critical
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9
>
> Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, 
> HBASE-17328-master.v2.patch, HBASE-17328.0.98.v4.patch, 
> HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch, 
> HBASE-17328.branch-1.1.v4.patch, HBASE-17328.master.v4.patch
>
>
> When adding a looped replication peer (clusterId == peerClusterId), the 
> following code terminates the replication source thread, but since the source 
> manager still holds a reference, WALs continue to get enqueued, and never get 
> cleaned because they're stuck in the queue, leading to an unsustainable 
> buildup.  Furthermore, the replication statistics thread will continue to 
> print statistics for the terminated source.
> {code}
> if (clusterId.equals(peerClusterId) && 
> !replicationEndpoint.canReplicateToSameCluster()) {
>   this.terminate("ClusterId " + clusterId + " is replicating to itself: 
> peerClusterId "
>   + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>   + replicationEndpoint.getClass().getName(), null, false);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11392) add/remove peer requests should be routed through master

2016-12-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766831#comment-15766831
 ] 

Hudson commented on HBASE-11392:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2171 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/2171/])
HBASE-11392 add/remove peer requests should be routed through master (zghao: 
rev e1f4aaeacdcbaffb02a08c29493601547c381941)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockNoopMasterServices.java
* (add) 
hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/ReplicationProtos.java
* (add) hbase-protocol-shaded/src/main/protobuf/Replication.proto
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java
* (edit) 
hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* (edit) src/main/asciidoc/_chapters/appendix_acl_matrix.adoc
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationFactory.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestSerialReplication.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelReplicationWithExpAsString.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsReplication.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java
* (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java
* (edit) 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdmin.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithTags.java
* (edit) hbase-protocol-shaded/src/main/protobuf/Master.proto
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java
* (add) 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/replication/ReplicationManager.java


> add/remove peer requests should be routed through master
> 
>
> Key: HBASE-11392
> URL: https://issues.apache.org/jira/browse/HBASE-11392
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, 
> HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch, 
> HBASE-11392-v6.patch
>
>
> ReplicationAdmin directly operates over the zookeeper data for replication 
> setup. We should move these operations to be routed through master for two 
> reasons: 
>  - Replication implementation details are exposed to client. We should move 
> most of the replication related classes to hbase-server package. 
>  - Routing the requests through master is the standard practice for all other 
> operations. It allows for decoupling implementation details from the client 
> and code.
> Review board: https://reviews.apache.org/r/54730/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >