[jira] [Reopened] (HBASE-22524) Refactor TestReplicationSyncUpTool

2020-08-10 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-22524:


Reopen for cherry-pick to branch-2.2.

> Refactor TestReplicationSyncUpTool
> --
>
> Key: HBASE-22524
> URL: https://issues.apache.org/jira/browse/HBASE-22524
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> Especially that TestReplicationSyncUpToolWithBulkLoadedData overrides a test 
> method, which is a bit hard to change in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23974) [Flakey Tests] Allow that server may not yet be cleared from DeadServers in TestHBCKSCP

2020-08-09 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-23974:
---
Fix Version/s: 2.2.6

> [Flakey Tests] Allow that server may not yet be cleared from DeadServers in 
> TestHBCKSCP
> ---
>
> Key: HBASE-23974
> URL: https://issues.apache.org/jira/browse/HBASE-23974
> Project: HBase
>  Issue Type: Test
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6
>
> Attachments: 
> 0001-HBASE-23974-Flakey-Tests-Allow-that-server-may-not-y.patch
>
>
> Fails 4/11 times in branch-2 flakey-test runs. Fails at line #157 where we 
> check to see if server still in dead server list. It may not have cleared 
> yet. Thats ok. Let the test pass. We've already asserted server is cleared 
> from meta, the point of HBCKSCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23974) [Flakey Tests] Allow that server may not yet be cleared from DeadServers in TestHBCKSCP

2020-08-09 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-23974.

Resolution: Fixed

Pushed to branch-2.2.

> [Flakey Tests] Allow that server may not yet be cleared from DeadServers in 
> TestHBCKSCP
> ---
>
> Key: HBASE-23974
> URL: https://issues.apache.org/jira/browse/HBASE-23974
> Project: HBase
>  Issue Type: Test
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: 
> 0001-HBASE-23974-Flakey-Tests-Allow-that-server-may-not-y.patch
>
>
> Fails 4/11 times in branch-2 flakey-test runs. Fails at line #157 where we 
> check to see if server still in dead server list. It may not have cleared 
> yet. Thats ok. Let the test pass. We've already asserted server is cleared 
> from meta, the point of HBCKSCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23974) [Flakey Tests] Allow that server may not yet be cleared from DeadServers in TestHBCKSCP

2020-08-09 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-23974:


Reopen for backport to branch-2.2.

> [Flakey Tests] Allow that server may not yet be cleared from DeadServers in 
> TestHBCKSCP
> ---
>
> Key: HBASE-23974
> URL: https://issues.apache.org/jira/browse/HBASE-23974
> Project: HBase
>  Issue Type: Test
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: 
> 0001-HBASE-23974-Flakey-Tests-Allow-that-server-may-not-y.patch
>
>
> Fails 4/11 times in branch-2 flakey-test runs. Fails at line #157 where we 
> check to see if server still in dead server list. It may not have cleared 
> yet. Thats ok. Let the test pass. We've already asserted server is cleared 
> from meta, the point of HBCKSCP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24819) Fix flaky test TestRaceBetweenSCPAndDTP and TestRaceBetweenSCPAndTRSP for branch-2.2

2020-08-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24819.

Fix Version/s: 2.2.6
   Resolution: Fixed

Pushed. Thanks [~meiyi] for reviewing.

> Fix flaky test TestRaceBetweenSCPAndDTP and TestRaceBetweenSCPAndTRSP for 
> branch-2.2
> 
>
> Key: HBASE-24819
> URL: https://issues.apache.org/jira/browse/HBASE-24819
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>
> Backport HBASE-23805 and HBASE-24338



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24819) Fix flaky test TestRaceBetweenSCPAndDTP and TestRaceBetweenSCPAndTRSP for branch-2.2

2020-08-04 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24819:
---
Summary: Fix flaky test TestRaceBetweenSCPAndDTP and 
TestRaceBetweenSCPAndTRSP for branch-2.2  (was: Fix flaky test 
TestRaceBetweenSCPAndDTP for branch-2.2)

> Fix flaky test TestRaceBetweenSCPAndDTP and TestRaceBetweenSCPAndTRSP for 
> branch-2.2
> 
>
> Key: HBASE-24819
> URL: https://issues.apache.org/jira/browse/HBASE-24819
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Backport HBASE-23805 and HBASE-24338



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24819) Fix flaky test TestRaceBetweenSCPAndDTP for branch-2.2

2020-08-04 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24819:
--

Assignee: Guanghao Zhang

> Fix flaky test TestRaceBetweenSCPAndDTP for branch-2.2
> --
>
> Key: HBASE-24819
> URL: https://issues.apache.org/jira/browse/HBASE-24819
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Backport HBASE-23805 and HBASE-24338



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24819) Fix flaky test TestRaceBetweenSCPAndDTP for branch-2.2

2020-08-04 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24819:
--

 Summary: Fix flaky test TestRaceBetweenSCPAndDTP for branch-2.2
 Key: HBASE-24819
 URL: https://issues.apache.org/jira/browse/HBASE-24819
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


Backport HBASE-23805 and HBASE-24338



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24818) Fix the precommit error for branch-1

2020-08-04 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24818.

Resolution: Duplicate

Sorry. Duplicate with HBASE-24816.

> Fix the precommit error for branch-1
> 
>
> Key: HBASE-24818
> URL: https://issues.apache.org/jira/browse/HBASE-24818
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24818) Fix the precommit error for branch-1

2020-08-04 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24818:
--

 Summary: Fix the precommit error for branch-1
 Key: HBASE-24818
 URL: https://issues.apache.org/jira/browse/HBASE-24818
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24818) Fix the precommit error for branch-1

2020-08-04 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24818:
---
Fix Version/s: 1.7.0

> Fix the precommit error for branch-1
> 
>
> Key: HBASE-24818
> URL: https://issues.apache.org/jira/browse/HBASE-24818
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24812) Fix the precommit error for branch-2.2

2020-08-03 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24812.

Fix Version/s: 2.2.6
   Resolution: Fixed

> Fix the precommit error for branch-2.2
> --
>
> Key: HBASE-24812
> URL: https://issues.apache.org/jira/browse/HBASE-24812
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.7.0, 2.2.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/view/change-requests/job/PR-2187/2/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24812) Fix the precommit error for branch-2.2

2020-08-03 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24812:
--

Assignee: Guanghao Zhang

> Fix the precommit error for branch-2.2
> --
>
> Key: HBASE-24812
> URL: https://issues.apache.org/jira/browse/HBASE-24812
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.7.0, 2.2.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/view/change-requests/job/PR-2187/2/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24812) Fix the precommit error for branch-2.2

2020-08-03 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170501#comment-17170501
 ] 

Guanghao Zhang commented on HBASE-24812:


Open a new issue for branch-1? As the issue title saied for branch-2.2. 

> Fix the precommit error for branch-2.2
> --
>
> Key: HBASE-24812
> URL: https://issues.apache.org/jira/browse/HBASE-24812
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.7.0, 2.2.6
>Reporter: Guanghao Zhang
>Priority: Major
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/view/change-requests/job/PR-2187/2/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24812) Fix the precommit error for branch-2.2

2020-08-03 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24812:
--

 Summary: Fix the precommit error for branch-2.2
 Key: HBASE-24812
 URL: https://issues.apache.org/jira/browse/HBASE-24812
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.6
Reporter: Guanghao Zhang


[https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/view/change-requests/job/PR-2187/2/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164052#comment-17164052
 ] 

Guanghao Zhang commented on HBASE-23634:


{quote}Say if HFile was placed under region/cf/recovered.edits/ dir, we could have cleaned it up before doing the next attempt. 
{quote}
Yes.  This can work. So it is ok to have corruption file in the process of 
split wal?

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

2020-07-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164048#comment-17164048
 ] 

Guanghao Zhang edited comment on HBASE-24749 at 7/24/20, 12:19 AM:
---

bq. it should be HBASE-20724, so for compaction we can reused that to confirm 
if the flushed StoreFile were from a compaction.

 

Yes. The compaction event marker in WAL is not used anymore.


was (Author: zghaobac):
{quote}{quote} it should be HBASE-20724, so for compaction we can reused that 
to confirm if the flushed StoreFile were from a compaction.
{quote}{quote}
Yes. The compaction event marker in WAL is not used anymore.

> Direct insert HFiles and Persist in-memory HFile tracking
> -
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
>  Issue Type: Umbrella
>  Components: Compaction, HFile
>Affects Versions: 3.0.0-alpha-1
>Reporter: Tak-Lon (Stephen) Wu
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Major
>  Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

2020-07-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164048#comment-17164048
 ] 

Guanghao Zhang edited comment on HBASE-24749 at 7/24/20, 12:18 AM:
---

{quote}{quote} it should be HBASE-20724, so for compaction we can reused that 
to confirm if the flushed StoreFile were from a compaction.
{quote}{quote}
Yes. The compaction event marker in WAL is not used anymore.


was (Author: zghaobac):
{quote}bq. it should be HBASE-20724, so for compaction we can reused that to 
confirm if the flushed StoreFile were from a compaction.
{quote}
Yes. The compaction event marker is not used anymore.

> Direct insert HFiles and Persist in-memory HFile tracking
> -
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
>  Issue Type: Umbrella
>  Components: Compaction, HFile
>Affects Versions: 3.0.0-alpha-1
>Reporter: Tak-Lon (Stephen) Wu
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Major
>  Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

2020-07-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164048#comment-17164048
 ] 

Guanghao Zhang commented on HBASE-24749:


{quote}bq. it should be HBASE-20724, so for compaction we can reused that to 
confirm if the flushed StoreFile were from a compaction.
{quote}
Yes. The compaction event marker is not used anymore.

> Direct insert HFiles and Persist in-memory HFile tracking
> -
>
> Key: HBASE-24749
> URL: https://issues.apache.org/jira/browse/HBASE-24749
> Project: HBase
>  Issue Type: Umbrella
>  Components: Compaction, HFile
>Affects Versions: 3.0.0-alpha-1
>Reporter: Tak-Lon (Stephen) Wu
>Assignee: Tak-Lon (Stephen) Wu
>Priority: Major
>  Labels: design, discussion, objectstore, storeFile, storeengine
> Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24743) Reject to add a peer which replicate to itself earlier

2020-07-23 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24743.

Resolution: Fixed

All ut passed. Pushed to branch-2 and master.

> Reject to add a peer which replicate to itself earlier
> --
>
> Key: HBASE-24743
> URL: https://issues.apache.org/jira/browse/HBASE-24743
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Now there are one check in ReplicationSource#initialize method
> {code:java}
> // In rare case, zookeeper setting may be messed up. That leads to the 
> incorrect
> // peerClusterId value, which is the same as the source clusterId
> if (clusterId.equals(peerClusterId) && 
> !replicationEndpoint.canReplicateToSameCluster()) {
>   this.terminate("ClusterId " + clusterId + " is replicating to itself: 
> peerClusterId "
>   + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>   + replicationEndpoint.getClass().getName(), null, false);
>   this.manager.removeSource(this);
>   return;
> }
> {code}
> This check should move to AddPeerProcedure's precheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-23 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163364#comment-17163364
 ] 

Guanghao Zhang commented on HBASE-23634:


{quote}We don't do that. If 'hbase.skip.errors' set to true we will atleast 
make the region open go ahead with out those files. If we don 't set it we will 
fail the region open.
{quote}
Yes. But I thought these should be cleanup directly when open region.
{quote}Here if we do write the HFiles directly under thd CF, we have a chance 
that we will get to see a partially written HFile. 
{quote}
What is the affect for other systems(offline compaction tool)?

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24619) Try compact the recovered hfiles firstly after region online

2020-07-21 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24619:
---
Parent: HBASE-23634
Issue Type: Sub-task  (was: Improvement)

> Try compact the recovered hfiles firstly after region online
> 
>
> Key: HBASE-24619
> URL: https://issues.apache.org/jira/browse/HBASE-24619
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> As discussed in HBASE-23739 and in HBASE-24632, there may have many recovered 
> hfiles. Should find a better way to compact them firstly after region online.
>  
> For instance (quoting our [~anoop.hbase]):
> "Assume there were some small files because of flush but never got compacted 
> before the RS down happened. We will look for the possible candidate from 
> oldest files and in all chance the very old files would get excluded because 
> of the size math. But It is possible that new flushed files would get 
> selected. And we have the max files to compact config also which is 10 by 
> default. Even these small files count alone might be >10. If there are say 15 
> WAL files to split, for sure we will have at least 15 small HFiles.
> My thinking was this. After the region open, we have to make sure these small 
> files are compacted in one go and we should not even consider the max files 
> limit for this compaction. Also to note that this files might not even have 
> the DBE/compression etc being applied. Ya coding wise am not sure how clean 
> it might come."
>  
> And from our [~pankaj2461]
>  
> "...concern is the compaction after region open, which impact MTTR due to 
> heavy IO in large cluster with many outstanding WALs"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-21 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161899#comment-17161899
 ] 

Guanghao Zhang edited comment on HBASE-23634 at 7/21/20, 9:51 AM:
--

{quote}What if the split is stopped in between. (WAL split RS died).. Also 
there may be other systems seeing directly the files under CF (Like offline 
compaction tool). How we can give a consistent view of the CF files to these 
other systems.
{quote}
For the WAL replay, it replay edits and may flush many times if the memstore is 
big. And if RS died before finishing replay, the other systems cannot get a  
consistent view of the CF files, too? So I thought HBase doesn't have this 
guarantee before.


was (Author: zghaobac):
{quote}What if the split is stopped in between. (WAL split RS died).. Also 
there may be other systems seeing directly the files under CF (Like offline 
compaction tool). How we can give a consistent view of the CF files to these 
other systems.
{quote}
For the WAL replay, it replay edits and may flush many times if the memstore is 
big. And if RS died before replay finished, the other systems cannot get a  
consistent view of the CF files, too?

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-21 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161899#comment-17161899
 ] 

Guanghao Zhang commented on HBASE-23634:


{quote}What if the split is stopped in between. (WAL split RS died).. Also 
there may be other systems seeing directly the files under CF (Like offline 
compaction tool). How we can give a consistent view of the CF files to these 
other systems.
{quote}
For the WAL replay, it replay edits and may flush many times if the memstore is 
big. And if RS died before replay finished, the other systems cannot get a  
consistent view of the CF files, too?

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24619) Try compact the recovered hfiles firstly after region online

2020-07-21 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24619:
--

Assignee: Guanghao Zhang

> Try compact the recovered hfiles firstly after region online
> 
>
> Key: HBASE-24619
> URL: https://issues.apache.org/jira/browse/HBASE-24619
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> As discussed in HBASE-23739 and in HBASE-24632, there may have many recovered 
> hfiles. Should find a better way to compact them firstly after region online.
>  
> For instance (quoting our [~anoop.hbase]):
> "Assume there were some small files because of flush but never got compacted 
> before the RS down happened. We will look for the possible candidate from 
> oldest files and in all chance the very old files would get excluded because 
> of the size math. But It is possible that new flushed files would get 
> selected. And we have the max files to compact config also which is 10 by 
> default. Even these small files count alone might be >10. If there are say 15 
> WAL files to split, for sure we will have at least 15 small HFiles.
> My thinking was this. After the region open, we have to make sure these small 
> files are compacted in one go and we should not even consider the max files 
> limit for this compaction. Also to note that this files might not even have 
> the DBE/compression etc being applied. Ya coding wise am not sure how clean 
> it might come."
>  
> And from our [~pankaj2461]
>  
> "...concern is the compaction after region open, which impact MTTR due to 
> heavy IO in large cluster with many outstanding WALs"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24743) Reject to add a peer which replicate to itself earlier

2020-07-20 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24743.

Fix Version/s: 2.4.0
   3.0.0-alpha-1
   Resolution: Fixed

Pushed to branch-2+. Thanks all for reviewing.

> Reject to add a peer which replicate to itself earlier
> --
>
> Key: HBASE-24743
> URL: https://issues.apache.org/jira/browse/HBASE-24743
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Now there are one check in ReplicationSource#initialize method
> {code:java}
> // In rare case, zookeeper setting may be messed up. That leads to the 
> incorrect
> // peerClusterId value, which is the same as the source clusterId
> if (clusterId.equals(peerClusterId) && 
> !replicationEndpoint.canReplicateToSameCluster()) {
>   this.terminate("ClusterId " + clusterId + " is replicating to itself: 
> peerClusterId "
>   + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>   + replicationEndpoint.getClass().getName(), null, false);
>   this.manager.removeSource(this);
>   return;
> }
> {code}
> This check should move to AddPeerProcedure's precheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-20 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161640#comment-17161640
 ] 

Guanghao Zhang commented on HBASE-23634:


[~Bo Cui]  Thanks for your suggsetion. So the new idea is not to load the 
recovered hfiles when open region and "read request" read the recovered hfiles 
directly. Meanwhile, submit a compaction request to compact all the recovered 
hfiles. The compaction result will write to the new hfile in CF dir. The 
benefits are no need many NN rpc anymore and better MTTR.

Furthermore, how about write the recovered hfile to CF dir directly? We can use 
the hfile name to distinguish which one is recovered.

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-20 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161078#comment-17161078
 ] 

Guanghao Zhang commented on HBASE-23634:


{quote}before compaction, large number of small hfiles affect read and write 
performance of region.
{quote}
Only read performance. And "bad performance read" is better than "cannot read"?
{quote}hfile needs 3 NN RPCs to bulkload during 
openRegion(validate、rename、createReader)
{quote}
For this, I thought the "validate" step may not need? For bulkload, it need to 
check the hfile meet the region start/end key. But for this, failover is a 
internal system operation, it doesn't need the validate step.

And for the RPCs, wal replay need same number "createReader" rpcs for the 
recovered edits.

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24690) Set version to 2.2.6 in branch-2.2 for first RC of 2.2.6

2020-07-16 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24690:
---
Fix Version/s: 2.2.6

> Set version to 2.2.6 in branch-2.2 for first RC of 2.2.6
> 
>
> Key: HBASE-24690
> URL: https://issues.apache.org/jira/browse/HBASE-24690
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24690) Set version to 2.2.6 in branch-2.2 for first RC of 2.2.6

2020-07-16 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24690.

  Assignee: Guanghao Zhang
Resolution: Fixed

> Set version to 2.2.6 in branch-2.2 for first RC of 2.2.6
> 
>
> Key: HBASE-24690
> URL: https://issues.apache.org/jira/browse/HBASE-24690
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24467) Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete timing out

2020-07-16 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24467.

Resolution: Fixed

> Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete 
> timing out
> 
>
> Key: HBASE-24467
> URL: https://issues.apache.org/jira/browse/HBASE-24467
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24721) rename_rsgroup overwriting the existing rsgroup.

2020-07-15 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158817#comment-17158817
 ] 

Guanghao Zhang commented on HBASE-24721:


+1 for branch-2.2. Let me cherry-pick it.

> rename_rsgroup overwriting the existing rsgroup.
> 
>
> Key: HBASE-24721
> URL: https://issues.apache.org/jira/browse/HBASE-24721
> Project: HBase
>  Issue Type: Bug
>Reporter: chiranjeevi
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0
>
>
> rename_rsgroup overwriting the current rsgroup.
> Steps:
> 1)add_rsgroup 'RSG1' and 'RSG2'
> 2)move_servers_rsgroup 'RSG1',['server1:port']
> 3)rename_rsgroup 'RSG1','RSG2'
>  After performing step3 RSG1 overwriting to RSG2 and region servers added in 
> RSG1 are not available now.
> Ideally system should show error message Group already exists: RSG2
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24721) rename_rsgroup overwriting the existing rsgroup.

2020-07-15 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24721:
---
Fix Version/s: 2.2.6

> rename_rsgroup overwriting the existing rsgroup.
> 
>
> Key: HBASE-24721
> URL: https://issues.apache.org/jira/browse/HBASE-24721
> Project: HBase
>  Issue Type: Bug
>Reporter: chiranjeevi
>Assignee: Mohammad Arshad
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> rename_rsgroup overwriting the current rsgroup.
> Steps:
> 1)add_rsgroup 'RSG1' and 'RSG2'
> 2)move_servers_rsgroup 'RSG1',['server1:port']
> 3)rename_rsgroup 'RSG1','RSG2'
>  After performing step3 RSG1 overwriting to RSG2 and region servers added in 
> RSG1 are not available now.
> Ideally system should show error message Group already exists: RSG2
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24615) MutableRangeHistogram#updateSnapshotRangeMetrics doesn't calculate the distribution for last bucket.

2020-07-15 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24615:
---
Fix Version/s: 2.2.6

> MutableRangeHistogram#updateSnapshotRangeMetrics doesn't calculate the 
> distribution for last bucket.
> 
>
> Key: HBASE-24615
> URL: https://issues.apache.org/jira/browse/HBASE-24615
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.3.0, master, 1.3.7, 2.2.6
>Reporter: Rushabh Shah
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0, 2.2.6
>
>
> We are not processing the distribution for last bucket. 
> https://github.com/apache/hbase/blob/master/hbase-hadoop-compat/src/main/java/org/apache/hadoop/metrics2/lib/MutableRangeHistogram.java#L70
> {code:java}
>   public void updateSnapshotRangeMetrics(MetricsRecordBuilder 
> metricsRecordBuilder,
>  Snapshot snapshot) {
> long priorRange = 0;
> long cumNum = 0;
> final long[] ranges = getRanges();
> final String rangeType = getRangeType();
> for (int i = 0; i < ranges.length - 1; i++) { -> The bug lies 
> here. We are not processing last bucket.
>   long val = snapshot.getCountAtOrBelow(ranges[i]);
>   if (val - cumNum > 0) {
> metricsRecordBuilder.addCounter(
> Interns.info(name + "_" + rangeType + "_" + priorRange + "-" + 
> ranges[i], desc),
> val - cumNum);
>   }
>   priorRange = ranges[i];
>   cumNum = val;
> }
> long val = snapshot.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24615) MutableRangeHistogram#updateSnapshotRangeMetrics doesn't calculate the distribution for last bucket.

2020-07-15 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158810#comment-17158810
 ] 

Guanghao Zhang commented on HBASE-24615:


{quote}The snippet you shared is for overflow bucket (for the count that falls 
outside of ranges[length-1])
In this jira we fixed the case where we are not updating the count between the 
range: ranges[length-2] and ranges[length-1]
{quote}
 

Got it. +1 for branch-2.2. Let me cherry-pick it.

> MutableRangeHistogram#updateSnapshotRangeMetrics doesn't calculate the 
> distribution for last bucket.
> 
>
> Key: HBASE-24615
> URL: https://issues.apache.org/jira/browse/HBASE-24615
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.3.0, master, 1.3.7, 2.2.6
>Reporter: Rushabh Shah
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0
>
>
> We are not processing the distribution for last bucket. 
> https://github.com/apache/hbase/blob/master/hbase-hadoop-compat/src/main/java/org/apache/hadoop/metrics2/lib/MutableRangeHistogram.java#L70
> {code:java}
>   public void updateSnapshotRangeMetrics(MetricsRecordBuilder 
> metricsRecordBuilder,
>  Snapshot snapshot) {
> long priorRange = 0;
> long cumNum = 0;
> final long[] ranges = getRanges();
> final String rangeType = getRangeType();
> for (int i = 0; i < ranges.length - 1; i++) { -> The bug lies 
> here. We are not processing last bucket.
>   long val = snapshot.getCountAtOrBelow(ranges[i]);
>   if (val - cumNum > 0) {
> metricsRecordBuilder.addCounter(
> Interns.info(name + "_" + rangeType + "_" + priorRange + "-" + 
> ranges[i], desc),
> val - cumNum);
>   }
>   priorRange = ranges[i];
>   cumNum = val;
> }
> long val = snapshot.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24615) MutableRangeHistogram#updateSnapshotRangeMetrics doesn't calculate the distribution for last bucket.

2020-07-15 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158799#comment-17158799
 ] 

Guanghao Zhang commented on HBASE-24615:


{code:java}
long val = snapshot.getCount();
if (val - cumNum > 0) {
  metricsRecordBuilder.addCounter(
  Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 1] 
+ "-inf", desc),
  val - cumNum);
}
{code}
I am not fully understand the fix. I thought the last bucket was handled in 
previous code, too?

> MutableRangeHistogram#updateSnapshotRangeMetrics doesn't calculate the 
> distribution for last bucket.
> 
>
> Key: HBASE-24615
> URL: https://issues.apache.org/jira/browse/HBASE-24615
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.3.0, master, 1.3.7, 2.2.6
>Reporter: Rushabh Shah
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 1.7.0, 2.4.0
>
>
> We are not processing the distribution for last bucket. 
> https://github.com/apache/hbase/blob/master/hbase-hadoop-compat/src/main/java/org/apache/hadoop/metrics2/lib/MutableRangeHistogram.java#L70
> {code:java}
>   public void updateSnapshotRangeMetrics(MetricsRecordBuilder 
> metricsRecordBuilder,
>  Snapshot snapshot) {
> long priorRange = 0;
> long cumNum = 0;
> final long[] ranges = getRanges();
> final String rangeType = getRangeType();
> for (int i = 0; i < ranges.length - 1; i++) { -> The bug lies 
> here. We are not processing last bucket.
>   long val = snapshot.getCountAtOrBelow(ranges[i]);
>   if (val - cumNum > 0) {
> metricsRecordBuilder.addCounter(
> Interns.info(name + "_" + rangeType + "_" + priorRange + "-" + 
> ranges[i], desc),
> val - cumNum);
>   }
>   priorRange = ranges[i];
>   cumNum = val;
> }
> long val = snapshot.getCount();
> if (val - cumNum > 0) {
>   metricsRecordBuilder.addCounter(
>   Interns.info(name + "_" + rangeType + "_" + ranges[ranges.length - 
> 1] + "-inf", desc),
>   val - cumNum);
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24467) Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete timing out

2020-07-15 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158033#comment-17158033
 ] 

Guanghao Zhang commented on HBASE-24467:


Found a incompatibility issue when release 2.2.6.

> Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete 
> timing out
> 
>
> Key: HBASE-24467
> URL: https://issues.apache.org/jira/browse/HBASE-24467
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24467) Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete timing out

2020-07-15 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158032#comment-17158032
 ] 

Guanghao Zhang commented on HBASE-24467:


h2. Problems with Methods, High Severity  1 

hbase-shaded-testing-util-2.2.5.jar, HBaseCommonTestingUtility.class
package org.apache.hadoop.hbase
[−] HBaseCommonTestingUtility.getRandomUUID ( )  *:*  UUID  1 
org/apache/hadoop/hbase/HBaseCommonTestingUtility.getRandomUUID:()Ljava/util/UUID;

|| ||Change||Effect||
||1|Method became *static*.|A client program may be interrupted by 
*NoSuchMethodError* exception.|

> Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete 
> timing out
> 
>
> Key: HBASE-24467
> URL: https://issues.apache.org/jira/browse/HBASE-24467
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24467) Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete timing out

2020-07-15 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-24467:


Reopen to add a addendum patch.

> Backport HBASE-23963: Split TestFromClientSide; it takes too long to complete 
> timing out
> 
>
> Key: HBASE-24467
> URL: https://issues.apache.org/jira/browse/HBASE-24467
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24689) Generate CHANGES.md and RELEASENOTES.md for 2.2.6

2020-07-15 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24689:
--

Assignee: Guanghao Zhang

> Generate CHANGES.md and RELEASENOTES.md for 2.2.6
> -
>
> Key: HBASE-24689
> URL: https://issues.apache.org/jira/browse/HBASE-24689
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24689) Generate CHANGES.md and RELEASENOTES.md for 2.2.6

2020-07-15 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24689.

Fix Version/s: 2.2.6
   Resolution: Fixed

Pushed. Thanks [~meiyi] for reviewing.

> Generate CHANGES.md and RELEASENOTES.md for 2.2.6
> -
>
> Key: HBASE-24689
> URL: https://issues.apache.org/jira/browse/HBASE-24689
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HBASE-24689) Generate CHANGES.md and RELEASENOTES.md for 2.2.6

2020-07-15 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-24689 started by Guanghao Zhang.
--
> Generate CHANGES.md and RELEASENOTES.md for 2.2.6
> -
>
> Key: HBASE-24689
> URL: https://issues.apache.org/jira/browse/HBASE-24689
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24734) Wrong comparator opening Region when 'split-to-WAL' enabled.

2020-07-15 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157968#comment-17157968
 ] 

Guanghao Zhang commented on HBASE-24734:


The hfile creation only happened on in BoundedRecoveredHFilesOutputSink? It 
used META COMPARATOR for meta table.

> Wrong comparator opening Region when 'split-to-WAL' enabled.
> 
>
> Key: HBASE-24734
> URL: https://issues.apache.org/jira/browse/HBASE-24734
> Project: HBase
>  Issue Type: Sub-task
>  Components: HFile, MTTR
>Reporter: Michael Stack
>Priority: Major
>
> Came across this when we were testing the 'split-to-hfile' feature running 
> ITBLL:
>  
> {code:java}
> 2020-07-10 10:16:49,983 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Closing region hbase:meta,,1.15882307402020-07-10 10:16:49,997 INFO 
> org.apache.hadoop.hbase.regionserver.HRegion: Closed 
> hbase:meta,,1.15882307402020-07-10 10:16:49,998 WARN 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler: Fatal error 
> occurred while opening region hbase:meta,,1.1588230740, 
> aborting...java.lang.IllegalArgumentException: Invalid range: 
> IntegrationTestBigLinkedList,,1594350463222.8f89e01a5245e79946e22d8a8ab4698b. 
> > 
> IntegrationTestBigLinkedList,\x10\x02J\xA1,1594349535271.be24dc276f686e6dcc7fb9d3f91c8387.
> at 
> org.apache.hadoop.hbase.client.RegionInfoBuilder$MutableRegionInfo.containsRange(RegionInfoBuilder.java:300)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.tryCommitRecoveredHFile(HStore.java:)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.loadRecoveredHFilesIfAny(HRegion.java:5442)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1010)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:950) 
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7490)   
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7448)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7424)   
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7382)   
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7333)   
>  at 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:135)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)  
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)2020-07-10 
> 10:16:50,005 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: * 
> ABORTING region server hbasedn149.example.org,16020,1594375563853: Failed to 
> open region hbase:meta,,1.1588230740 and can not recover 
> *java.lang.IllegalArgumentException: Invalid range: 
> IntegrationTestBigLinkedList,,1594350463222.8f89e01a5245e79946e22d8a8ab4698b. 
> > 
> IntegrationTestBigLinkedList,\x10\x02J\xA1,1594349535271.be24dc276f686e6dcc7fb9d3f91c8387.
>  {code}
> Seems basic case of wrong comparator. Below passes if I use the meta 
> comparator
> {code:java}
>  @Test
> public void testBinaryKeys() throws Exception {
>   Set set = new TreeSet<>(CellComparatorImpl.COMPARATOR);
>   final byte [] fam = Bytes.toBytes("col");
>   final byte [] qf = Bytes.toBytes("umn");
>   final byte [] nb = new byte[0];
>   Cell [] keys = {
>   createByteBufferKeyValueFromKeyValue(
>   new KeyValue(Bytes.toBytes("a,\u\u,2"), fam, qf, 2, 
> nb)),
>   createByteBufferKeyValueFromKeyValue(
>   new KeyValue(Bytes.toBytes("a,\u0001,3"), fam, qf, 3, nb)),
>   createByteBufferKeyValueFromKeyValue(
>   new KeyValue(Bytes.toBytes("a,,1"), fam, qf, 1, nb)),
>   createByteBufferKeyValueFromKeyValue(
>   new KeyValue(Bytes.toBytes("a,\u1000,5"), fam, qf, 5, nb)),
>   createByteBufferKeyValueFromKeyValue(
>   new KeyValue(Bytes.toBytes("a,a,4"), fam, qf, 4, nb)),
>   createByteBufferKeyValueFromKeyValue(
>   new KeyValue(Bytes.toBytes("a,a,0"), fam, qf, 0, nb)),
>   };
>   // Add to set with bad comparator
>   Collections.addAll(set, keys);
>   // This will output the keys incorrectly.
>   boolean assertion = false;
>   int count = 0;
>   try {
> for (Cell k: set) {
>   assertTrue("count=" + count + ", " + k.toString(), count++ == 
> k.getTimestamp());
> }
>   } catch (AssertionError e) {
> // Expected
> assertion = true;
>   }
>   assertTrue(assertion);
>   // Make set with 

[jira] [Assigned] (HBASE-24743) Reject to add a peer which replicate to itself earlier

2020-07-14 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24743:
--

Assignee: Guanghao Zhang

> Reject to add a peer which replicate to itself earlier
> --
>
> Key: HBASE-24743
> URL: https://issues.apache.org/jira/browse/HBASE-24743
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now there are one check in ReplicationSource#initialize method
> {code:java}
> // In rare case, zookeeper setting may be messed up. That leads to the 
> incorrect
> // peerClusterId value, which is the same as the source clusterId
> if (clusterId.equals(peerClusterId) && 
> !replicationEndpoint.canReplicateToSameCluster()) {
>   this.terminate("ClusterId " + clusterId + " is replicating to itself: 
> peerClusterId "
>   + peerClusterId + " which is not allowed by ReplicationEndpoint:"
>   + replicationEndpoint.getClass().getName(), null, false);
>   this.manager.removeSource(this);
>   return;
> }
> {code}
> This check should move to AddPeerProcedure's precheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24743) Reject to add a peer which replicate to itself earlier

2020-07-14 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24743:
--

 Summary: Reject to add a peer which replicate to itself earlier
 Key: HBASE-24743
 URL: https://issues.apache.org/jira/browse/HBASE-24743
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang


Now there are one check in ReplicationSource#initialize method
{code:java}
// In rare case, zookeeper setting may be messed up. That leads to the incorrect
// peerClusterId value, which is the same as the source clusterId
if (clusterId.equals(peerClusterId) && 
!replicationEndpoint.canReplicateToSameCluster()) {
  this.terminate("ClusterId " + clusterId + " is replicating to itself: 
peerClusterId "
  + peerClusterId + " which is not allowed by ReplicationEndpoint:"
  + replicationEndpoint.getClass().getName(), null, false);
  this.manager.removeSource(this);
  return;
}
{code}
This check should move to AddPeerProcedure's precheck.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24578) [WAL] Add a parameter to config RingBufferEventHandler's SyncFuture count

2020-07-14 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157799#comment-17157799
 ] 

Guanghao Zhang commented on HBASE-24578:


This pushed to branch-2.2+. So we can close this jira?

> [WAL] Add a parameter to config RingBufferEventHandler's SyncFuture count
> -
>
> Key: HBASE-24578
> URL: https://issues.apache.org/jira/browse/HBASE-24578
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 1.4.13, 2.2.5
>Reporter: Reid Chan
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> The current value of RingBufferEventHandler's handler is the value of 
> {{hbase.regionserver.handler.count}}, which works good in default wal 
> provider --- one WAL per regionserver.
> When trying to use WAL group provider, either by group or wal per region, the 
> default value is bad. If rs has 100 regions and wal per region strategy is 
> used, then rs will allocate 100 * 
> SyncFuture[$hbase.regionserver.handler.count] array
> {code}
> int maxHandlersCount = conf.getInt(HConstants.REGION_SERVER_HANDLER_COUNT, 
> 200);
> this.ringBufferEventHandler = new RingBufferEventHandler(
> conf.getInt("hbase.regionserver.hlog.syncer.count", 5), 
> maxHandlersCount); 
> ...
> 
> RingBufferEventHandler(final int syncRunnerCount, final int maxHandlersCount) 
> {
>   this.syncFutures = new SyncFuture[maxHandlersCount];
>   ...
>  }
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-14 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157208#comment-17157208
 ] 

Guanghao Zhang commented on HBASE-23634:


[~Bo Cui] Can you show some data here? How many wals and regions and how many 
recovered hfiles? And the bottleneck is the open speed? I thought there are 
some issues for region open problem.

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24578) [WAL] Add a parameter to config RingBufferEventHandler's SyncFuture count

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24578:
---
Fix Version/s: (was: 2.2.7)
   2.2.6

> [WAL] Add a parameter to config RingBufferEventHandler's SyncFuture count
> -
>
> Key: HBASE-24578
> URL: https://issues.apache.org/jira/browse/HBASE-24578
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 1.4.13, 2.2.5
>Reporter: Reid Chan
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> The current value of RingBufferEventHandler's handler is the value of 
> {{hbase.regionserver.handler.count}}, which works good in default wal 
> provider --- one WAL per regionserver.
> When trying to use WAL group provider, either by group or wal per region, the 
> default value is bad. If rs has 100 regions and wal per region strategy is 
> used, then rs will allocate 100 * 
> SyncFuture[$hbase.regionserver.handler.count] array
> {code}
> int maxHandlersCount = conf.getInt(HConstants.REGION_SERVER_HANDLER_COUNT, 
> 200);
> this.ringBufferEventHandler = new RingBufferEventHandler(
> conf.getInt("hbase.regionserver.hlog.syncer.count", 5), 
> maxHandlersCount); 
> ...
> 
> RingBufferEventHandler(final int syncRunnerCount, final int maxHandlersCount) 
> {
>   this.syncFutures = new SyncFuture[maxHandlersCount];
>   ...
>  }
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24578) [WAL] Add a parameter to config RingBufferEventHandler's SyncFuture count

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24578:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> [WAL] Add a parameter to config RingBufferEventHandler's SyncFuture count
> -
>
> Key: HBASE-24578
> URL: https://issues.apache.org/jira/browse/HBASE-24578
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 1.4.13, 2.2.5
>Reporter: Reid Chan
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.7
>
>
> The current value of RingBufferEventHandler's handler is the value of 
> {{hbase.regionserver.handler.count}}, which works good in default wal 
> provider --- one WAL per regionserver.
> When trying to use WAL group provider, either by group or wal per region, the 
> default value is bad. If rs has 100 regions and wal per region strategy is 
> used, then rs will allocate 100 * 
> SyncFuture[$hbase.regionserver.handler.count] array
> {code}
> int maxHandlersCount = conf.getInt(HConstants.REGION_SERVER_HANDLER_COUNT, 
> 200);
> this.ringBufferEventHandler = new RingBufferEventHandler(
> conf.getInt("hbase.regionserver.hlog.syncer.count", 5), 
> maxHandlersCount); 
> ...
> 
> RingBufferEventHandler(final int syncRunnerCount, final int maxHandlersCount) 
> {
>   this.syncFutures = new SyncFuture[maxHandlersCount];
>   ...
>  }
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24737) Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten problem

2020-07-13 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24737:
--

 Summary: Find a way to resolve 
WALFileLengthProvider#getLogFileSizeIfBeingWritten problem
 Key: HBASE-24737
 URL: https://issues.apache.org/jira/browse/HBASE-24737
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


Now we use WALFileLengthProvider#getLogFileSizeIfBeingWritten to get the synced 
wal length and prevent replicating unacked log entries. But after offload 
ReplicationSource to new ReplicationServer, we need a new way to resolve this 
problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24736) Remove ReplicationSourceInterface#enqueueLog method and get the WAL to replicate from ReplicationQueueStorage directly

2020-07-13 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24736:
--

 Summary: Remove ReplicationSourceInterface#enqueueLog method and 
get the WAL to replicate from ReplicationQueueStorage directly
 Key: HBASE-24736
 URL: https://issues.apache.org/jira/browse/HBASE-24736
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


There are two choices for this now.
 # Start a backgroud thread and keep to read the new WAL from 
ReplicationQueueStorage.
 # The default ReplicationQueueStorage is based on ZK. Set watcher on znode and 
read the WAL when there are zk event. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24736) Remove ReplicationSourceInterface#enqueueLog method and get the new WAL to replicate from ReplicationQueueStorage directly

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24736:
---
Summary: Remove ReplicationSourceInterface#enqueueLog method and get the 
new WAL to replicate from ReplicationQueueStorage directly  (was: Remove 
ReplicationSourceInterface#enqueueLog method and get the WAL to replicate from 
ReplicationQueueStorage directly)

> Remove ReplicationSourceInterface#enqueueLog method and get the new WAL to 
> replicate from ReplicationQueueStorage directly
> --
>
> Key: HBASE-24736
> URL: https://issues.apache.org/jira/browse/HBASE-24736
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
>
> There are two choices for this now.
>  # Start a backgroud thread and keep to read the new WAL from 
> ReplicationQueueStorage.
>  # The default ReplicationQueueStorage is based on ZK. Set watcher on znode 
> and read the WAL when there are zk event. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24735) Refactor ReplicationSourceManager: move logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24735:
---
Description: After this, ReplicationSourceManager should only keep the 
methods which are used for startup/terminate ReplicationSource.

> Refactor ReplicationSourceManager: move 
> logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside
> ---
>
> Key: HBASE-24735
> URL: https://issues.apache.org/jira/browse/HBASE-24735
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
>
> After this, ReplicationSourceManager should only keep the methods which are 
> used for startup/terminate ReplicationSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24735) Refactor ReplicationSourceManager: move logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24735:
---
Description: After this, ReplicationSourceManager should only keep the 
methods which are used for startup/terminate ReplicationSource. The 
startup/teminate releated work will be moved to new ReplicationServer.   (was: 
After this, ReplicationSourceManager should only keep the methods which are 
used for startup/terminate ReplicationSource.)

> Refactor ReplicationSourceManager: move 
> logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside
> ---
>
> Key: HBASE-24735
> URL: https://issues.apache.org/jira/browse/HBASE-24735
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> After this, ReplicationSourceManager should only keep the methods which are 
> used for startup/terminate ReplicationSource. The startup/teminate releated 
> work will be moved to new ReplicationServer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24735) Refactor ReplicationSourceManager: move logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24735:
--

Assignee: Guanghao Zhang

> Refactor ReplicationSourceManager: move 
> logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside
> ---
>
> Key: HBASE-24735
> URL: https://issues.apache.org/jira/browse/HBASE-24735
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> After this, ReplicationSourceManager should only keep the methods which are 
> used for startup/terminate ReplicationSource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24735) Refactor ReplicationSourceManager: move logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside

2020-07-13 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24735:
--

 Summary: Refactor ReplicationSourceManager: move 
logPositionAndCleanOldLogs/cleanUpHFileRefs to ReplicationSource inside
 Key: HBASE-24735
 URL: https://issues.apache.org/jira/browse/HBASE-24735
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24681) Remove the "cache" walsById/walsByIdRecoveredQueues from ReplicationSourceManager

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24681.

Resolution: Fixed

> Remove the "cache" walsById/walsByIdRecoveredQueues from 
> ReplicationSourceManager
> -
>
> Key: HBASE-24681
> URL: https://issues.apache.org/jira/browse/HBASE-24681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23591) Negative memStoreSizing

2020-07-13 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-23591:
---
Fix Version/s: (was: 2.2.2)
   2.2.7

> Negative memStoreSizing
> ---
>
> Key: HBASE-23591
> URL: https://issues.apache.org/jira/browse/HBASE-23591
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Szabolcs Bukros
>Priority: Major
> Fix For: 2.2.7
>
>
> After a flush on the replica region the memStoreSizing becomes negative:
> {code:java}
> 2019-12-17 08:31:59,983 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: 
> COMMIT_FLUSH table_name: "IntegrationTestRegionReplicaReplicati
> on" encoded_region_name: "544affde3e027454f67c8ea46c8f69ee" 
> flush_sequence_number: 41392 store_flushes { family_name: "f1" 
> store_home_dir: "f1" flush_output: "3c48a23eac784a348a18e10e337d80a2" } 
> store_flushes { family_name: "f2" store_home_dir: "f2" flush_output: 
> "9a5283ec95694667b4ead2398af5f01e" } store_flushes { family_name: "f3" 
> store_home_dir: "f3" flush_output: "e6f25e6b0eca4d22af15d0626d0f8759" } 
> region_name: 
> "IntegrationTestRegionReplicaReplication,,1576599911697.544affde3e027454f67c8ea46c8f69ee."
> 2019-12-17 08:31:59,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Received a flush commit marker with 
> seqId:41392 and a previous prepared snapshot was found
> 2019-12-17 08:31:59,993 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f1/3c48a23eac784a348a18e10e337d80a2,
>  entries=32445, sequenceid=41392, filesize=27.6 M
> 2019-12-17 08:32:00,016 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f2/9a5283ec95694667b4ead2398af5f01e,
>  entries=12264, sequenceid=41392, filesize=10.9 M
> 2019-12-17 08:32:00,121 INFO org.apache.hadoop.hbase.regionserver.HStore: 
> Region: 0beaae111b0f6e98bfde31ba35be5408 added 
> hdfs://replica-1:8020/hbase/data/default/IntegrationTestRegionReplicaReplication/544affde3e027454f67c8ea46c8f69ee/f3/e6f25e6b0eca4d22af15d0626d0f8759,
>  entries=32379, sequenceid=41392, filesize=27.5 M
> 2019-12-17 08:32:00,122 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> CustomLog decrMemStoreSize. Current: dataSize=135810071, 
> getHeapSize=148400960, getOffHeapSize=0, getCellsCount=167243 delta: 
> dataSizeDelta=155923644, heapSizeDelta=170112320, offHeapSizeDelta=0, 
> cellsCountDelta=188399
> 2019-12-17 08:32:00,122 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
> Asked to modify this region's 
> (IntegrationTestRegionReplicaReplication,,1576599911697_0001.0beaae111b0f6e98bfde31ba35be54
> 08.) memStoreSizing to a negative value which is incorrect. Current 
> memStoreSizing=135810071, delta=-155923644
> java.lang.Exception
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1323)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1316)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.decrMemStoreSize(HRegion.java:1303)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushCommitMarker(HRegion.java:5194)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayWALFlushMarker(HRegion.java:5025)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doReplayBatchOp(RSRpcServices.java:1143)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:2232)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:29754)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> I added some custom logging to the snapshot logic to be able to see snapshot 
> sizes: 
> {code:java}
> 2019-12-17 08:31:56,900 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> 0beaae111b0f6e98bfde31ba35be5408 : Replaying flush marker action: START_FLUSH 
> table_name: "IntegrationTestRegionReplicaReplication" encoded_region_name: 
> "544affde3e027454f67c8ea46c8f69ee" flush_sequence_number: 41392 store_flushes 
> { family_name: "f1" store_home_dir: "f1" } 

[jira] [Commented] (HBASE-24603) Zookeeper sync() call is async

2020-07-10 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17155263#comment-17155263
 ] 

Guanghao Zhang commented on HBASE-24603:


+1 for branch-2.2.  

> Zookeeper sync() call is async
> --
>
> Key: HBASE-24603
> URL: https://issues.apache.org/jira/browse/HBASE-24603
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 1.7.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0
>
>
> Here is the method that does a sync() of lagging followers with leader in the 
> quorum. We rely on this to see a consistent snapshot of ZK data from multiple 
> clients. However the problem is that the underlying sync() call is actually 
> asynchronous since we are passing a 'null' call back.  See the ZK API 
> [doc|https://zookeeper.apache.org/doc/r3.5.7/apidocs/zookeeper-server/index.html]
>  for details. The end-result is that sync() doesn't guarantee that it has 
> happened by the time it returns.
> {noformat}
>   /**
>* Forces a synchronization of this ZooKeeper client connection.
>* 
>* Executing this method before running other methods will ensure that the
>* subsequent operations are up-to-date and consistent as of the time that
>* the sync is complete.
>* 
>* This is used for compareAndSwap type operations where we need to read the
>* data of an existing node and delete or transition that node, utilizing 
> the
>* previously read version and data.  We want to ensure that the version 
> read
>* is up-to-date from when we begin the operation.
>*/
>   public void sync(String path) throws KeeperException {
> this.recoverableZooKeeper.sync(path, null, null);
>   }
> {noformat}
> We rely on this heavily (at least in the older branches that do ZK based 
> region assignment). In branch-1 we saw weird "BadVersionException" exceptions 
> in RITs because of the inconsistent view of the ZK snapshot. It could 
> manifest differently in other branches. Either way, this is something we need 
> to fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24663) Add procedure process time statistics UI

2020-07-10 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24663.

Fix Version/s: 2.4.0
   3.0.0-alpha-1
   Resolution: Fixed

Pushed to branch-2 and master. Thanks [~Joseph295] for contributing.

> Add procedure process time statistics UI
> 
>
> Key: HBASE-24663
> URL: https://issues.apache.org/jira/browse/HBASE-24663
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Junhong Xu
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: screenshot-1.png
>
>
> Added in "Procedures & Locks" jsp.
> For the first version UI, we care about the process time of 
> ServerCrashProcedure, TRSP, OpenRegionProcedure and CloseRegionProcedure. 
> Plan to show the avg/P50/P90/min/max process time of these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24653) Show snapshot owner on Master WebUI

2020-07-10 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24653.

Fix Version/s: 2.4.0
   3.0.0-alpha-1
   Resolution: Fixed

Pushed to branch-2 and master. Thanks [~niuyulin] for contributing.

> Show snapshot owner on Master WebUI
> ---
>
> Key: HBASE-24653
> URL: https://issues.apache.org/jira/browse/HBASE-24653
> Project: HBase
>  Issue Type: Improvement
>Reporter: Yi Mei
>Assignee: niuyulin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
> Attachments: snapshot1.png, snapshot2.png
>
>
> Now Master UI shows lots of snapshot informations, and owner is also useful 
> to find out who create this snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24489) Rewrite TestClusterRestartFailover.test since namespace table is gone on on master

2020-07-09 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24489.

Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

Pushed to master. Thanks [~Ddupg] for contributing.

> Rewrite TestClusterRestartFailover.test since namespace table is gone on on 
> master
> --
>
> Key: HBASE-24489
> URL: https://issues.apache.org/jira/browse/HBASE-24489
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: Duo Zhang
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> We still have this
> {code}
> // Find server that does not have hbase:namespace on it. This tests holds 
> up SCPs. If it
> // holds up the server w/ hbase:namespace, the Master initialization will 
> be held up
> // because this table is not online and test fails.
> for (JVMClusterUtil.RegionServerThread rst:
> UTIL.getHBaseCluster().getLiveRegionServerThreads()) {
>   HRegionServer rs = rst.getRegionServer();
>   if (rs.getRegions(TableName.NAMESPACE_TABLE_NAME).isEmpty()) {
> SERVER_FOR_TEST = rs.getServerName();
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-22738) Fallback to default group to choose RS when there are no RS in current group

2020-07-09 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-22738.

Fix Version/s: 2.4.0
   3.0.0-alpha-1
   Resolution: Fixed

Pushed to branch-2 and master. Thanks [~Ddupg] for contributing. Please help to 
add a release note, thanks.

> Fallback to default group to choose RS when there are no RS in current group
> 
>
> Key: HBASE-22738
> URL: https://issues.apache.org/jira/browse/HBASE-22738
> Project: HBase
>  Issue Type: Bug
>  Components: rsgroup
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> We configure one regionserver for hbase system table. But when rolling 
> upgrade, you need move the region to other regionservers. But because there 
> are no other regionservers in this group, you cannot move the region...
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24663) Add procedure process time statistics UI

2020-07-08 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17153302#comment-17153302
 ] 

Guanghao Zhang commented on HBASE-24663:


ServerCrashProcedure only take 7 ms?

> Add procedure process time statistics UI
> 
>
> Key: HBASE-24663
> URL: https://issues.apache.org/jira/browse/HBASE-24663
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Junhong Xu
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Added in "Procedures & Locks" jsp.
> For the first version UI, we care about the process time of 
> ServerCrashProcedure, TRSP, OpenRegionProcedure and CloseRegionProcedure. 
> Plan to show the avg/P50/P90/min/max process time of these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24431) RSGroupInfo add configuration map to store something extra

2020-07-08 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24431.

Resolution: Fixed

Pushed to branch-2 and master. Thanks [~Ddupg] for contributing.

> RSGroupInfo add configuration map to store something extra
> --
>
> Key: HBASE-24431
> URL: https://issues.apache.org/jira/browse/HBASE-24431
> Project: HBase
>  Issue Type: Improvement
>  Components: rsgroup
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Maybe we should add a _Map configuration_ into RSGroupInfo to 
> store extra infomation.
> For example, we can store the minimum number of machines the group needs, in 
> order to move machine into this group automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24682) Refactor ReplicationSource#addHFileRefs method: move it to ReplicationSourceManager

2020-07-08 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24682.

Resolution: Fixed

Pushed to HBASE-24666 branch. Thanks [~wchevreuil] for reviewing.

> Refactor ReplicationSource#addHFileRefs method: move it to 
> ReplicationSourceManager
> ---
>
> Key: HBASE-24682
> URL: https://issues.apache.org/jira/browse/HBASE-24682
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24691) Fix flaky TestWALEntryStream

2020-07-07 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24691.

Resolution: Duplicate

Resolved as duplicate.

> Fix flaky TestWALEntryStream
> 
>
> Key: HBASE-24691
> URL: https://issues.apache.org/jira/browse/HBASE-24691
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.2/lastSuccessfulBuild/artifact/dashboard.html]
>  
> Failed 100.0% (13 / 13) recently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24691) Fix flaky TestWALEntryStream

2020-07-06 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152437#comment-17152437
 ] 

Guanghao Zhang commented on HBASE-24691:


{code:java}
2020-07-07 09:17:39,962 INFO  [Time-limited test] 
regionserver.ReplicationSourceWALReader(115): peerClusterZnode=null, 
ReplicationSourceWALReaderThread : null inited, 
replicationBatchSizeCapacity=67108864, replicationBatchCountCapacity=10, 
replicationBatchQueueCapacity=1
2020-07-07 09:17:39,978 DEBUG [Thread-196] regionserver.WALEntryStream(251): 
Reached the end of log 
hdfs://localhost:44204/home/hao/open_source/hbase/hbase-server/target/test-data/3ee454c8-b764-b7c1-6312-9819838ebf2a/WALs/testReplicationSourceWALReaderRecovered/testReplicationSourceWALReaderRecovered.1594084659042
Exception in thread "Thread-196" java.lang.NullPointerException
  at 
org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.getSyncedLength(AsyncProtobufLogWriter.java:237)
  at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.getLogFileSizeIfBeingWritten(AbstractFSWAL.java:1017)
  at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:264)
  at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:188)
  at 
org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)
  at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:192)
  at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:138)
2020-07-07 09:30:30,177 DEBUG [Time-limited test] wal.AbstractFSWAL(858): Moved 
2 WAL file(s) to 
/home/hao/open_source/hbase/hbase-server/target/test-data/3ee454c8-b764-b7c1-6312-9819838ebf2a/oldWALs
{code}
Got NPE and the thread terminated.

> Fix flaky TestWALEntryStream
> 
>
> Key: HBASE-24691
> URL: https://issues.apache.org/jira/browse/HBASE-24691
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.2/lastSuccessfulBuild/artifact/dashboard.html]
>  
> Failed 100.0% (13 / 13) recently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24663) Add procedure process time statistics UI

2020-07-06 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152407#comment-17152407
 ] 

Guanghao Zhang commented on HBASE-24663:


Assigned to you [~Joseph295].

> Add procedure process time statistics UI
> 
>
> Key: HBASE-24663
> URL: https://issues.apache.org/jira/browse/HBASE-24663
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Junhong Xu
>Priority: Major
>
> Added in "Procedures & Locks" jsp.
> For the first version UI, we care about the process time of 
> ServerCrashProcedure, TRSP, OpenRegionProcedure and CloseRegionProcedure. 
> Plan to show the avg/P50/P90/min/max process time of these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24663) Add procedure process time statistics UI

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24663:
--

Assignee: Junhong Xu

> Add procedure process time statistics UI
> 
>
> Key: HBASE-24663
> URL: https://issues.apache.org/jira/browse/HBASE-24663
> Project: HBase
>  Issue Type: Improvement
>Reporter: Guanghao Zhang
>Assignee: Junhong Xu
>Priority: Major
>
> Added in "Procedures & Locks" jsp.
> For the first version UI, we care about the process time of 
> ServerCrashProcedure, TRSP, OpenRegionProcedure and CloseRegionProcedure. 
> Plan to show the avg/P50/P90/min/max process time of these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24691) Fix flaky TestWALEntryStream

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24691:
--

Assignee: Guanghao Zhang

> Fix flaky TestWALEntryStream
> 
>
> Key: HBASE-24691
> URL: https://issues.apache.org/jira/browse/HBASE-24691
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.2/lastSuccessfulBuild/artifact/dashboard.html]
>  
> Failed 100.0% (13 / 13) recently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24691) Fix flaky TestWALEntryStream

2020-07-06 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24691:
--

 Summary: Fix flaky TestWALEntryStream
 Key: HBASE-24691
 URL: https://issues.apache.org/jira/browse/HBASE-24691
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


[https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.2/lastSuccessfulBuild/artifact/dashboard.html]

 

Failed 100.0% (13 / 13) recently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24690) Set version to 2.2.6 in branch-2.2 for first RC of 2.2.6

2020-07-06 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24690:
--

 Summary: Set version to 2.2.6 in branch-2.2 for first RC of 2.2.6
 Key: HBASE-24690
 URL: https://issues.apache.org/jira/browse/HBASE-24690
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24689) Generate CHANGES.md and RELEASENOTES.md for 2.2.6

2020-07-06 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24689:
--

 Summary: Generate CHANGES.md and RELEASENOTES.md for 2.2.6
 Key: HBASE-24689
 URL: https://issues.apache.org/jira/browse/HBASE-24689
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24666) Offload the replication source/sink job to independent Replication Server

2020-07-06 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151929#comment-17151929
 ] 

Guanghao Zhang commented on HBASE-24666:


{quote}you don execute any CP hooks as part of compaction? May be you don have 
such a need ?
{quote}
Yes. For internal usage, we didn't consider this problem at all..

> Offload the replication source/sink job to independent Replication Server
> -
>
> Key: HBASE-24666
> URL: https://issues.apache.org/jira/browse/HBASE-24666
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Guanghao Zhang
>Priority: Major
>
> The basic idea is add a role "ReplicationServer" to take the replication 
> source/sink job. HMaster is responsible for scheduling the replication job to 
> different ReplicationServer.
> [link Design 
> doc|https://docs.google.com/document/d/16kRPVGctFSf__nC3yaVZmAm3GTxIbHefekKC_rMmTw8/edit?usp=sharing]
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24684) Fetch ReplicationSink servers list from HMaster instead of ZooKeeper

2020-07-06 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24684:
--

 Summary: Fetch ReplicationSink servers list from HMaster instead 
of ZooKeeper
 Key: HBASE-24684
 URL: https://issues.apache.org/jira/browse/HBASE-24684
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24683) Add a basic ReplicationServer which only implement ReplicationSink Service

2020-07-06 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24683:
--

 Summary: Add a basic ReplicationServer which only implement 
ReplicationSink Service
 Key: HBASE-24683
 URL: https://issues.apache.org/jira/browse/HBASE-24683
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24666) Offload the replication source/sink job to independent Replication Server

2020-07-06 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151793#comment-17151793
 ] 

Guanghao Zhang commented on HBASE-24666:


{quote} I found that the CP hooks is becoming a blocker in moving things out of 
RS.
{quote}
Yes. CP is a blocker if it need access RS during Replication/Compaction. Need 
to check when the CP was called.

> Offload the replication source/sink job to independent Replication Server
> -
>
> Key: HBASE-24666
> URL: https://issues.apache.org/jira/browse/HBASE-24666
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Guanghao Zhang
>Priority: Major
>
> The basic idea is add a role "ReplicationServer" to take the replication 
> source/sink job. HMaster is responsible for scheduling the replication job to 
> different ReplicationServer.
> [link Design 
> doc|https://docs.google.com/document/d/16kRPVGctFSf__nC3yaVZmAm3GTxIbHefekKC_rMmTw8/edit?usp=sharing]
> Suggestions are welcomed. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24593) [branch-2.2] Fix the maven compilation failure for nightly build

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24593.

Fix Version/s: 2.2.6
   Resolution: Fixed

> [branch-2.2] Fix the maven compilation failure for nightly build
> 
>
> Key: HBASE-24593
> URL: https://issues.apache.org/jira/browse/HBASE-24593
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/896/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24562) Stabilize master startup with meta replicas enabled

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24562:
---
Fix Version/s: 2.2.6

> Stabilize master startup with meta replicas enabled
> ---
>
> Key: HBASE-24562
> URL: https://issues.apache.org/jira/browse/HBASE-24562
> Project: HBase
>  Issue Type: Improvement
>  Components: meta, read replicas
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0, 2.2.5
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6
>
>
> This is related to HBASE-21624 . 
> I created a separate ticket because in the original one a "complete solution 
> for meta replicas" was requested and this is not one. I'm just trying to make 
> master startup more stable by making assigning meta replicas asynchronous and 
> preventing a potential assignment failure from crashing master.
> The idea is that starting master with less or even no meta replicas assigned 
> is preferable to not having a running master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24428) Priority compaction for recently split daughter regions

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24428:
---
Fix Version/s: (was: 2.2.5)
   2.2.6

> Priority compaction for recently split daughter regions
> ---
>
> Key: HBASE-24428
> URL: https://issues.apache.org/jira/browse/HBASE-24428
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Andrew Kyle Purtell
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 1.7.0, 2.2.6
>
>
> We observe that under hotspotting conditions that splitting will proceed very 
> slowly and the "_Cannot split region due to reference files being there_" log 
> line will be logged excessively. (branch-1 based production.) This is because 
> after a region is split it must be compacted before it can be split again. 
> Reference files must be replaced by real HFiles, normal housekeeping 
> performed during compaction. However if the regionserver is under excessive 
> load, its compaction queues may become deep. The daughters of a recently 
> split hotspotting region may themselves continue to hotspot and will rapidly 
> need to split again. If the scheduled compaction work to remove/replace 
> reference files is queued hundreds or thousands of compaction queue elements 
> behind current, the recently split daughter regions will not be able to split 
> again for a long time and may grow very large, producing additional 
> complications (very large regions, very deep replication queues).
> To help avoid this condition we should prioritize the compaction of recently 
> split daughter regions. Compaction requests include a {{priority}} field and 
> CompactionRequest implements a comparator that sorts by this field. We 
> already detect when a compaction request involves a region that has reference 
> files, to ensure that it gets selected to be eligible for compaction, but we 
> do not seem to prioritize the requests for post-split housekeeping. Split 
> work should be placed at the top of the queue. Ensure that this is happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24427) HStore.add log format error

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24427:
---
Fix Version/s: (was: 2.2.5)
   2.2.6

> HStore.add log format error
> ---
>
> Key: HBASE-24427
> URL: https://issues.apache.org/jira/browse/HBASE-24427
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.3.0, master, 2.2.4
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.10, 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24426) Missing regionName while logging warning in HBCKServerCrashProcedure

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24426:
---
Fix Version/s: (was: 2.2.5)
   2.2.6

> Missing regionName while logging warning in HBCKServerCrashProcedure
> 
>
> Key: HBASE-24426
> URL: https://issues.apache.org/jira/browse/HBASE-24426
> Project: HBase
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24280) Hadoop2 and Hadoop3 profiles being activated simultaneously causing test failures

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24280:
---
Fix Version/s: 2.2.6

> Hadoop2 and Hadoop3 profiles being activated simultaneously causing test 
> failures
> -
>
> Key: HBASE-24280
> URL: https://issues.apache.org/jira/browse/HBASE-24280
> Project: HBase
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 2.3.0, 2.2.6
>
> Attachments: HBASE-24280.master.001.patch, 
> TEST-org.apache.hadoop.hbase.rest.TestSecureRESTServer.xml
>
>
> [~ndimiduk] pointed out that, after this change went in, TestSecureRESTServer 
> started failing with Hadoop3 on branch-2.3
> https://builds.apache.org/job/HBase%20Nightly/job/branch-2.3/56/
> Of course, I ran this with 1.8.0_241 and Maven 3.6.33 and it passed :) {{mvn 
> clean package -Dtest=TestSecureRESTServer -Dhadoop.profile=3.0 
> -DfailIfNoTests=false}}
> FYI [~stoty] in case you can repro a failure and want to dig in. Feel free to 
> re-assign.
> It looks like we didn't have a nightly run of branch-2.2 due to docker 
> container build issues. Will be interesting to see if it fails there. It did 
> not fail the master nightly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24102) RegionMover should exclude draining/decommissioning nodes from target RSs

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24102:
---
Fix Version/s: (was: 2.2.5)
   2.2.6

> RegionMover should exclude draining/decommissioning nodes from target RSs
> -
>
> Key: HBASE-24102
> URL: https://issues.apache.org/jira/browse/HBASE-24102
> Project: HBase
>  Issue Type: Improvement
>Reporter: Anoop Sam John
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.10, 2.2.6
>
>
> When using RegionMover tool to unload the regions from a given RS, it decides 
> the list of destination RSs by 
> {code}
> List regionServers = new ArrayList<>();
> regionServers.addAll(admin.getRegionServers());
> // Remove the host Region server from target Region Servers list
> ServerName server = stripServer(regionServers, hostname, port);
> .
> // Remove RS present in the exclude file
> stripExcludes(regionServers);
> stripMaster(regionServers);
> {code}
> Ya it is removing the RSs mentioned in the exclude file.  
> Better when the RegionMover user is NOT mentioning any exclude list, we can 
> exclude the draining/decommissioning RSs
> Admin#listDecommissionedRegionServers()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24022) Set version as 2.2.5-SNAPSHOT in branch-2.2

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24022:
---
Fix Version/s: (was: 2.2.5)
   2.2.6

> Set version as 2.2.5-SNAPSHOT in branch-2.2
> ---
>
> Key: HBASE-24022
> URL: https://issues.apache.org/jira/browse/HBASE-24022
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 2.2.6
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-21905) TestFIFOCompactionPolicy is flaky

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21905:
---
Fix Version/s: 2.2.6

> TestFIFOCompactionPolicy is flaky
> -
>
> Key: HBASE-21905
> URL: https://issues.apache.org/jira/browse/HBASE-21905
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-alpha-1, 1.5.0, 2.3.0
>Reporter: Andrew Kyle Purtell
>Assignee: Bharath Vissapragada
>Priority: Major
>  Labels: branch-1
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0, 2.2.6
>
> Attachments: 
> org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy-output.txt,
>  testFIFOCompactionPolicyExpiredEmptyHFiles-failure-log.txt
>
>
> java.lang.IllegalArgumentException , overlaps with 
> For example:
> [ERROR] 
> testFIFOCompactionPolicyExpiredEmptyHFiles(org.apache.hadoop.hbase.regionserver.compactions.TestFIFOCompactionPolicy)
>   Time elapsed: 3.321 s  <<< ERROR!
> java.io.IOException: 
> java.io.IOException: 
> [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69,
>  
> hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c65648691f614b2d8dd4b586c5923bfe]
>  overlaps with 
> [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69]
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2438)
>     at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>     at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
> Caused by: java.lang.IllegalArgumentException: 
> [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69,
>  
> hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c65648691f614b2d8dd4b586c5923bfe]
>  overlaps with 
> [hdfs://localhost:41525/user/apurtell/test-data/734de07d-1f22-46a9-a1f5-96ad4578450b/data/default/testFIFOCompactionPolicyExpiredEmptyHFiles/c4f673438e09d7ef5a9b79b363639cde/f/c0c5836c1f714f78847cf00326586b69]
>     at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:119)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.addToCompactingFiles(HStore.java:1824)
>     at 
> org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1798)
>     at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:415)
>     at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompactionInternal(CompactSplitThread.java:388)
>     at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompactionInternal(CompactSplitThread.java:317)
>     at 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(CompactSplitThread.java:306)
>     at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.compactRegion(RSRpcServices.java:1513)
>     at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:23649)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380)
>     ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24546) CloneSnapshotProcedure unlimited retry

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-24546.

Resolution: Fixed

Pushed to branch-2.2+. Thanks [~wenfeiyi666] for contributing.

> CloneSnapshotProcedure unlimited retry
> --
>
> Key: HBASE-24546
> URL: https://issues.apache.org/jira/browse/HBASE-24546
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.3.0, master, 2.2.5
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> since regions dir was not remove in the previous execution created,  need to 
> be remove when retrying, resulting in exception, unlimited retry
> {code:java}
> procedure.CloneSnapshotProcedure: Retriable error trying to clone 
> snapshot=snapshot_test to table=test:backup 
> state=CLONE_SNAPSHOT_WRITE_FS_LAYOUT
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotException: clone snapshot={ 
> ss=snapshot_test table=test:backup type=FLUSH } failed because A clone should 
> not have regions to remove
> at 
> org.apache.hadoop.hbase.master.procedure.CloneSnapshotProcedure$1.createHdfsRegions(CloneSnapshotProcedure.java:434)
> at 
> org.apache.hadoop.hbase.master.procedure.CloneSnapshotProcedure.createFsLayout(CloneSnapshotProcedure.java:465)
> at 
> org.apache.hadoop.hbase.master.procedure.CloneSnapshotProcedure.createFilesystemLayout(CloneSnapshotProcedure.java:392)
> at 
> org.apache.hadoop.hbase.master.procedure.CloneSnapshotProcedure.executeFromState(CloneSnapshotProcedure.java:142)
> at 
> org.apache.hadoop.hbase.master.procedure.CloneSnapshotProcedure.executeFromState(CloneSnapshotProcedure.java:67)
> at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:194)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1662)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1409)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1979)
> Caused by: java.lang.IllegalArgumentException: A clone should not have 
> regions to remove
> at 
> org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:142)
> at 
> org.apache.hadoop.hbase.master.procedure.CloneSnapshotProcedure$1.createHdfsRegions(CloneSnapshotProcedure.java:418)
> ... 10 more
> {code}
> and the cloned regions name are unchanged, resulting in new created regions 
> be removed when retrying



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24665) all wal of RegionGroupingProvider together roll

2020-07-06 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24665:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> all wal of RegionGroupingProvider together roll
> ---
>
> Key: HBASE-24665
> URL: https://issues.apache.org/jira/browse/HBASE-24665
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.3.0, master, 2.1.10, 1.4.14, 2.2.6
>Reporter: wenfeiyi666
>Assignee: wenfeiyi666
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.10, 1.4.14, 2.2.7
>
>
> when use RegionGroupingProvider, any a wal request roll, all wal will be 
> together roll.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22146) SpaceQuotaViolationPolicy Disable is not working in Namespace level

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-22146:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> SpaceQuotaViolationPolicy Disable is not working in Namespace level
> ---
>
> Key: HBASE-22146
> URL: https://issues.apache.org/jira/browse/HBASE-22146
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: Uma Maheswari
>Assignee: Surbhi Kochhar
>Priority: Major
>  Labels: Quota, space
> Fix For: 3.0.0-alpha-1, 2.2.7
>
>
> SpaceQuotaViolationPolicy Disable is not working in Namespace level
> PFB the steps:
>  * Create Namespace and set Quota violation policy as Disable
>  * Create tables under namespace and violate Quota
> Expected result: Tables to get disabled
> Actual Result: Tables are not getting disabled
> Note: mutation operation is not allowed on the table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24503) Backport HBASE-24492 to all 2.x branch

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24503:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> Backport HBASE-24492 to all 2.x branch
> --
>
> Key: HBASE-24503
> URL: https://issues.apache.org/jira/browse/HBASE-24503
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
> Fix For: 2.3.1, 2.1.10, 2.2.7
>
>
> After release 2.3.0 is out, we need to backport HBASE-24492 to all 2.x 
> branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22348) allow one to actually disable replication svc

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-22348:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> allow one to actually disable replication svc
> -
>
> Key: HBASE-22348
> URL: https://issues.apache.org/jira/browse/HBASE-22348
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.2.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: replication
> Fix For: 2.2.7
>
> Attachments: HBASE-22348.patch
>
>
> Minor, but it does create extra ZK traffic for no reason and there's no way 
> to disable that it appears. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-22917) Proc-WAL roll fails always saying someone else has already created log

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-22917:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> Proc-WAL roll fails always saying someone else has already created log
> --
>
> Key: HBASE-22917
> URL: https://issues.apache.org/jira/browse/HBASE-22917
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, wal
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 2.2.7
>
>
> Recently we met a weird scenario where Procedure WAL roll fails as it is 
> already created by someone else.
> Later while going through the logs and code, observed that during Proc-WAL 
> roll it failed to write the header. On failure file stream is just closed,
> {code}
>  try {
>  ProcedureWALFormat.writeHeader(newStream, header);
>  startPos = newStream.getPos();
>  } catch (IOException ioe) {
>  LOG.warn("Encountered exception writing header", ioe);
>  newStream.close();
>  return false;
>  }
> {code}
> Since we don't delete the corrupted file or increment the *flushLogId*, so on 
> each retry it is trying to create the same *flushLogId* file. However Hmaster 
> failover will resolve this issue, but we should handle it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24403) FsDelegationToken should cache Token

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-24403:
---
Fix Version/s: (was: 2.2.6)
   2.2.7

> FsDelegationToken should cache Token
> 
>
> Key: HBASE-24403
> URL: https://issues.apache.org/jira/browse/HBASE-24403
> Project: HBase
>  Issue Type: Bug
>Reporter: wuchang
>Assignee: wuchang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.7
>
> Attachments: 24403.patch
>
>
> When doing Bulkload, we find that FsDelegationToken will acquire token of 
> NameNode everytime for a single file, although, from the comment of 
> acquireDelegationToken(), it claims that it is trying to find token in cache 
> firstly, but the newly requested token is never put to cache and thus the 
> cache is still empty for the following request;
> In cases there are many files to do the bulk load, the token request will 
> cause big short to NameNode.
>  
> {code:java}
> public void acquireDelegationToken(final FileSystem fs)
>  throws IOException {
>  if (userProvider.isHadoopSecurityEnabled()) {
>  this.fs = fs;
>  userToken = userProvider.getCurrent().getToken("HDFS_DELEGATION_TOKEN",
>  fs.getCanonicalServiceName());
>  if (userToken == null) {
>  hasForwardedToken = false;
>  try {
>  userToken = fs.getDelegationToken(renewer);
>  } catch (NullPointerException npe) {
>  // we need to handle NullPointerException in case HADOOP-10009 is missing
>  LOG.error("Failed to get token for " + renewer);
>  }
>  } else {
>  hasForwardedToken = true;
>  LOG.info("Use the existing token: " + userToken);
>  }
>  }
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24682) Refactor ReplicationSource#addHFileRefs method: move it to ReplicationSourceManager

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24682:
--

Assignee: Guanghao Zhang

> Refactor ReplicationSource#addHFileRefs method: move it to 
> ReplicationSourceManager
> ---
>
> Key: HBASE-24682
> URL: https://issues.apache.org/jira/browse/HBASE-24682
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24681) Remove the "cache" walsById/walsByIdRecoveredQueues from ReplicationSourceManager

2020-07-05 Thread Guanghao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reassigned HBASE-24681:
--

Assignee: Guanghao Zhang

> Remove the "cache" walsById/walsByIdRecoveredQueues from 
> ReplicationSourceManager
> -
>
> Key: HBASE-24681
> URL: https://issues.apache.org/jira/browse/HBASE-24681
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24682) Refactor ReplicationSource#addHFileRefs method: move it to ReplicationSourceManager

2020-07-05 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24682:
--

 Summary: Refactor ReplicationSource#addHFileRefs method: move it 
to ReplicationSourceManager
 Key: HBASE-24682
 URL: https://issues.apache.org/jira/browse/HBASE-24682
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24681) Remove the "cache" walsById/walsByIdRecoveredQueues from ReplicationSourceManager

2020-07-05 Thread Guanghao Zhang (Jira)
Guanghao Zhang created HBASE-24681:
--

 Summary: Remove the "cache" walsById/walsByIdRecoveredQueues from 
ReplicationSourceManager
 Key: HBASE-24681
 URL: https://issues.apache.org/jira/browse/HBASE-24681
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-05 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151688#comment-17151688
 ] 

Guanghao Zhang commented on HBASE-23634:


{quote}Now to make these file are official HFiles (move to under the region/cf 
directory) is the responsibility of primary regions. Only the primary region 
should do this. This will happen when the primary region is opened. 
{quote}
Right.

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-11288) Splittable Meta

2020-07-02 Thread Guanghao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149947#comment-17149947
 ] 

Guanghao Zhang commented on HBASE-11288:


The key thing is store the ROOT table as a "general hbase table" VS "master 
local region", right?  We need to agree on this design firstly.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >