[jira] [Resolved] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2

2024-01-25 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-28330.
-
Fix Version/s: 2.6.0
   2.5.8
   Resolution: Fixed

Pushed to branch-2, branch-2.5, branch-2.6. Thanks for the review [~zhangduo] 

> TestUnknownServers.testListUnknownServers is flaky in branch-2
> --
>
> Key: HBASE-28330
> URL: https://issues.apache.org/jira/browse/HBASE-28330
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.5.7
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.6.0, 2.5.8
>
>
> {code:java}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 
> s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
> [ERROR] 
> org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  
> Time elapsed: 0.204 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<2> {code}
> The value of TestUnknownServers.SLAVES is different between 
> [branch-2|https://github.com/apache/hbase/blob/68bc533f7116cedc681704b82319e5793b827621/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L44]
>  and 
> [master|https://github.com/apache/hbase/blob/b87b05c847f00c292664d894c21f83c73d48460d/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L43].
> It is 1 in master but 2 in branch-2.
> The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
> not tracked by the ServerManager.
> Please see HMaster.getUnknownServers
> {code:java}
> private List getUnknownServers() {
>   if (serverManager != null) {
> final Set serverNames = 
> getAssignmentManager().getRegionStates().getRegionStates()
>   .stream().map(RegionState::getServerName).collect(Collectors.toSet());
> final List unknownServerNames = serverNames.stream()
>   .filter(sn -> sn != null && 
> serverManager.isServerUnknown(sn)).collect(Collectors.toList());
> return unknownServerNames;
>   }
>   return null;
> } {code}
> In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster 
> with 2 RegionServer, if all region are assigned to ONE server, then only that 
> server is called UNKNOWN_SERVER, the UT will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2

2024-01-25 Thread Sun Xin (Jira)
Sun Xin created HBASE-28330:
---

 Summary: TestUnknownServers.testListUnknownServers is flaky in 
branch-2
 Key: HBASE-28330
 URL: https://issues.apache.org/jira/browse/HBASE-28330
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.5.7
Reporter: Sun Xin
Assignee: Sun Xin


{code:java}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 s 
<<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers
[ERROR] 
org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers  Time 
elapsed: 0.204 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<2> {code}
The value of TestUnknownServers.SLAVES is different between branch-2 and master.

It is 1 in master but 2 in branch-2.

The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is 
not tracked by the ServerManager.

Please see HMaster.getUnknownServers
{code:java}
private List getUnknownServers() {
  if (serverManager != null) {
final Set serverNames = 
getAssignmentManager().getRegionStates().getRegionStates()
  .stream().map(RegionState::getServerName).collect(Collectors.toSet());
final List unknownServerNames = serverNames.stream()
  .filter(sn -> sn != null && 
serverManager.isServerUnknown(sn)).collect(Collectors.toList());
return unknownServerNames;
  }
  return null;
} {code}
In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster with 
2 RegionServer, if all region are assigned to ONE server, then only that server 
is called UNKNOWN_SERVER, the UT will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28324) TestRegionNormalizerWorkQueue#testTake is flaky

2024-01-21 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-28324.
-
Fix Version/s: 2.6.0
   2.4.18
   2.5.8
   3.0.0-beta-2
   Resolution: Fixed

Pushed to all active branches. Thanks for the review [~zhangduo] 

> TestRegionNormalizerWorkQueue#testTake is flaky
> ---
>
> Key: HBASE-28324
> URL: https://issues.apache.org/jira/browse/HBASE-28324
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-beta-1, 2.5.7
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 2.5.8, 3.0.0-beta-2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28324) TestRegionNormalizerWorkQueue#testTake is flaky

2024-01-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-28324:
---

 Summary: TestRegionNormalizerWorkQueue#testTake is flaky
 Key: HBASE-28324
 URL: https://issues.apache.org/jira/browse/HBASE-28324
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.5.7, 3.0.0-beta-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table

2022-11-14 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-27469.
-
Fix Version/s: 2.5.2
   2.4.16
   (was: 2.6.0)
   Resolution: Fixed

> IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when 
> dropping a table
> 
>
> Key: HBASE-27469
> URL: https://issues.apache.org/jira/browse/HBASE-27469
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha-3, 2.5.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-4, 2.5.2, 2.4.16
>
>
> If enabled the feature about scan snapshot and grant the permissions of a 
> table and a namespace to the same user, an IllegalArgumentException will be 
> thrown when droping tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27476) Recovered replication may be blocked if enabled hbase.separate.oldlogdir.by.regionserver

2022-11-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-27476:
---

 Summary: Recovered replication may be blocked if enabled 
hbase.separate.oldlogdir.by.regionserver
 Key: HBASE-27476
 URL: https://issues.apache.org/jira/browse/HBASE-27476
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.15, 3.0.0-alpha-3
Reporter: Sun Xin
Assignee: Sun Xin


In other PR, I got a failed UT
{code:java}
[ERROR] Failures: 
[ERROR] 
org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSWithSeparateOldWALs.killOneMasterRS
[ERROR]   Run 1: 
TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84
 Waited too much time for queueFailover replication. Waited 61065ms.
[ERROR]   Run 2: 
TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84
 Waited too much time for queueFailover replication. Waited 58864ms.
[ERROR]   Run 3: 
TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84
 Waited too much time for queueFailover replication. Waited 57103ms. {code}
This should be caused by a bug.

If enabled {_}hbase.separate.oldlogdir.by.regionserver{_}, old wals will be 
moved into different dir by regionserver name like root/oldWALs/server1/wal1 . 
For recovered replication,  can't convert wal path(like root/oldWALs/wal1) into 
such paths, and throws FileNotFoundException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table

2022-11-07 Thread Sun Xin (Jira)
Sun Xin created HBASE-27469:
---

 Summary: IllegalArgumentException is thrown by 
SnapshotScannerHDFSAclController when dropping a table
 Key: HBASE-27469
 URL: https://issues.apache.org/jira/browse/HBASE-27469
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.5.1, 3.0.0-alpha-3
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 2.6.0, 3.0.0-alpha-4


If enabled the feature about scan snapshot and grant the permissions of a table 
and a namespace to the same user, an IllegalArgumentException will be thrown 
when droping tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27354) EOF thrown by WALEntryStream causes replication blocking

2022-09-01 Thread Sun Xin (Jira)
Sun Xin created HBASE-27354:
---

 Summary: EOF thrown by WALEntryStream causes replication blocking
 Key: HBASE-27354
 URL: https://issues.apache.org/jira/browse/HBASE-27354
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.14, 3.0.0-alpha-3, 2.5.0, 2.6.0
Reporter: Sun Xin
Assignee: Sun Xin


In 
[WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257],
 it is possible that we read uncommitted data.  If we read beyond the committed 
file length, then reopen the 

inputStream and seek back.

In our use, we found that the position where seek back may be exactly the 
length of the file  being written, which may cause EOF.

The thrown EOF is finally caught 
[ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158],
 but 
[totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78]
 is not cleanup up.

After a long run, all peers will go slow and eventually block completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-20 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26956.
-
Fix Version/s: 2.5.0
   2.6.0
   Resolution: Done

Pushed to branch-2 and branch-2.5

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>  Components: snapshots
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-15 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin reopened HBASE-26956:
-

Will close this issue after porting to branch-2.x.

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-3
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-06-15 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26956.
-
Fix Version/s: 3.0.0-alpha-4
 Release Note: ExportSnapshot tool support removing TTL of snapshot. If we 
use the ExportSnapshot tool to recover snapshot with TTL from cold storage to 
hbase cluster, we can set `-reset-ttl` to prevent snapshot from being deleted 
immediately.
   Resolution: Done

Thanks for the review.[~zhangduo] 

> ExportSnapshot tool supports removing TTL
> -
>
> Key: HBASE-26956
> URL: https://issues.apache.org/jira/browse/HBASE-26956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
>
> In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
> S3. But when we restored back to HBase cluster, it will be deleted directly 
> because TTL is set.
> So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26956) ExportSnapshot tool supports removing TTL

2022-04-15 Thread Sun Xin (Jira)
Sun Xin created HBASE-26956:
---

 Summary: ExportSnapshot tool supports removing TTL
 Key: HBASE-26956
 URL: https://issues.apache.org/jira/browse/HBASE-26956
 Project: HBase
  Issue Type: New Feature
Reporter: Sun Xin
Assignee: Sun Xin


In our scenario, we use ExportSnapshot to copy snapshots to cold storage like 
S3. But when we restored back to HBase cluster, it will be deleted directly 
because TTL is set.

So we need ExportSnapshot tool support removing TTL.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26406) Can not add peer replicating to non-HBase

2021-11-02 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26406.
-
Fix Version/s: 2.4.9
   3.0.0-alpha-2
   Resolution: Fixed

Pushed to master and 2.x branchs. Thank all for reviewing.

> Can not add peer replicating to non-HBase
> -
>
> Key: HBASE-26406
> URL: https://issues.apache.org/jira/browse/HBASE-26406
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-2, 2.4.9
>
>
> Failed to add a peer replicating to non-HBase(like MQ) by implementing custom 
> ReplicationEndpoint,  got exception like this in my UT: 
> {code:java}
> 2021-10-29T15:14:47,632 INFO  [RPCClient-NioEventLoopGroup-5-3] 
> client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
> ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
> replicate to itself for 
> HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO  
> [RPCClient-NioEventLoopGroup-5-3] 
> client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
> ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
> replicate to itself for HBaseInterClusterReplicationEndpoint
> org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should 
> not replicate to itself for HBaseInterClusterReplicationEndpoint
>  at java.lang.Thread.getStackTrace(Thread.java:1559) at 
> org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) 
> at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at 
> org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at 
> org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at 
> org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at 
> org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) at Future.get(Unknown 
> Source) at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527)
>  at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367)
>  at 
> org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123)
>  at 
> org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101)
>  at 
> org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162)
>  at 
> 

[jira] [Created] (HBASE-26406) Can not add peer replicating to non-HBase

2021-10-29 Thread Sun Xin (Jira)
Sun Xin created HBASE-26406:
---

 Summary: Can not add peer replicating to non-HBase
 Key: HBASE-26406
 URL: https://issues.apache.org/jira/browse/HBASE-26406
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.0, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


Failed to add a peer replicating to non-HBase(like MQ) by implementing custom 
ReplicationEndpoint,  got exception like this in my UT: 
{code:java}
2021-10-29T15:14:47,632 INFO  [RPCClient-NioEventLoopGroup-5-3] 
client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
replicate to itself for 
HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO  
[RPCClient-NioEventLoopGroup-5-3] 
client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: 
ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not 
replicate to itself for HBaseInterClusterReplicationEndpoint
org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should 
not replicate to itself for HBaseInterClusterReplicationEndpoint
 at java.lang.Thread.getStackTrace(Thread.java:1559) at 
org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) at 
org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at 
org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at 
org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at 
org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748) at Future.get(Unknown 
Source) at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527)
 at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367)
 at 
org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123)
 at 
org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101)
 at 
org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162)
 at 
org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:43)
 at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:190)
 at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) 
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667)
 at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414)
 at 

[jira] [Resolved] (HBASE-25773) TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky

2021-09-02 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25773.
-
Resolution: Fixed

Pushed to branch-2 and master, thanks [~zhangduo] for reviewing.

> TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky
> --
>
> Key: HBASE-25773
> URL: https://issues.apache.org/jira/browse/HBASE-25773
> Project: HBase
>  Issue Type: Improvement
>Reporter: Xiaolin Ha
>Assignee: Sun Xin
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3140/2/testReport/org.apache.hadoop.hbase.security.access/TestSnapshotScannerHDFSAclController/precommit_checks___yetus_jdk8_Hadoop3_checks__/]
> SnapshotScannerHDFSAclController.postStartMaster alters hbase:acl to add a 
> new cf "m", but 
> `TestSnapshotScannerHDFSAclController.setupBeforeClass(TestSnapshotScannerHDFSAclController.java:101)`
>  fails before the disable and enable hbase:acl complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26194) Introduce a ReplicationServerSourceManager to simplify HReplicationServer

2021-08-17 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26194.
-
Resolution: Done

Merged. Thank [~stack] for reviewing.

> Introduce a ReplicationServerSourceManager to simplify HReplicationServer
> -
>
> Key: HBASE-26194
> URL: https://issues.apache.org/jira/browse/HBASE-26194
> Project: HBase
>  Issue Type: Sub-task
>  Components: Replication
>Affects Versions: 3.0.0-alpha-2
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26194) Introduce a ReplicationServerSourceManager to simplify HReplicationServer

2021-08-12 Thread Sun Xin (Jira)
Sun Xin created HBASE-26194:
---

 Summary: Introduce a ReplicationServerSourceManager to simplify 
HReplicationServer
 Key: HBASE-26194
 URL: https://issues.apache.org/jira/browse/HBASE-26194
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-2
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-2






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26084) Add owner of replication queue for ReplicationQueueInfo

2021-08-12 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-26084.
-
Fix Version/s: 3.0.0-alpha-2
   Resolution: Done

Merged.

Thank [~stack] [~zhangduo] for reviewing.

> Add owner of replication queue for ReplicationQueueInfo
> ---
>
> Key: HBASE-26084
> URL: https://issues.apache.org/jira/browse/HBASE-26084
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
>
> The current ReplicationQueueInfo only has queueId, which is not enough to 
> distinguish queues in ReplicationServer,  so we need to add the RS holding 
> the queue for ReplicationQueueInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26084) Add owner of replication queue for ReplicationQueueInfo

2021-07-13 Thread Sun Xin (Jira)
Sun Xin created HBASE-26084:
---

 Summary: Add owner of replication queue for ReplicationQueueInfo
 Key: HBASE-26084
 URL: https://issues.apache.org/jira/browse/HBASE-26084
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


The current ReplicationQueueInfo only has queueId, which is not enough to 
distinguish queues in ReplicationServer,  so we need to add the RS holding the 
queue for ReplicationQueueInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25110) Add heartbeat for ReplicationServer and dispatch replication sources to ReplicationServer

2021-07-09 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25110.
-
Release Note: Divide this issue into two to achieve, HBASE-26077 and 
HBASE-26078
  Resolution: Incomplete

> Add heartbeat for ReplicationServer and dispatch replication sources to 
> ReplicationServer
> -
>
> Key: HBASE-25110
> URL: https://issues.apache.org/jira/browse/HBASE-25110
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26078) Dispatch replication sources to ReplicationServer

2021-07-09 Thread Sun Xin (Jira)
Sun Xin created HBASE-26078:
---

 Summary: Dispatch replication sources to ReplicationServer
 Key: HBASE-26078
 URL: https://issues.apache.org/jira/browse/HBASE-26078
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-26077) Add heartbeat for ReplicationServer

2021-07-09 Thread Sun Xin (Jira)
Sun Xin created HBASE-26077:
---

 Summary: Add heartbeat for ReplicationServer
 Key: HBASE-26077
 URL: https://issues.apache.org/jira/browse/HBASE-26077
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto

2021-05-23 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25807.
-
Fix Version/s: 3.0.0-alpha-1
   Resolution: Done

Merged. Thank [~zhangduo] for reviewing.

> Move method reportProcedureDone from RegionServerStatus.proto to Master.proto
> -
>
> Key: HBASE-25807
> URL: https://issues.apache.org/jira/browse/HBASE-25807
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> We next need use the procedure mechanism to implement enable/disable/refresh 
> peer, and  ReplicationServer also needs reportProcedureDone to master, so I 
> hope to move method reportProcedureDone to Master.proto from 
> RegionServerStatus.proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25820) Find a way to know whether logQueue goes empty when ReplicationSource is running on ReplicationServer

2021-04-28 Thread Sun Xin (Jira)
Sun Xin created HBASE-25820:
---

 Summary: Find a way to know whether logQueue goes empty when 
ReplicationSource is running on ReplicationServer
 Key: HBASE-25820
 URL: https://issues.apache.org/jira/browse/HBASE-25820
 Project: HBase
  Issue Type: Sub-task
Reporter: Sun Xin


HBASE-25110 we choose to use ZK to notify ReplicationServer that a new wal was 
generated, this is asynchronous. And then we got a problem, the shipper thread 
and the wal reader thread may go terminated as logQueue goes empty before 
receiving the notification of new wal.

So we now need find a way to know whether logQueue is really empty after the 
last wal in logQueue is consumed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24737) Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten problem

2021-04-26 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-24737.
-
Resolution: Done

> Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten 
> problem
> 
>
> Key: HBASE-24737
> URL: https://issues.apache.org/jira/browse/HBASE-24737
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>
> Now we use WALFileLengthProvider#getLogFileSizeIfBeingWritten to get the 
> synced wal length and prevent replicating unacked log entries. But after 
> offload ReplicationSource to new ReplicationServer, we need a new way to 
> resolve this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto

2021-04-23 Thread Sun Xin (Jira)
Sun Xin created HBASE-25807:
---

 Summary: Move method reportProcedureDone from 
RegionServerStatus.proto to Master.proto
 Key: HBASE-25807
 URL: https://issues.apache.org/jira/browse/HBASE-25807
 Project: HBase
  Issue Type: Sub-task
Reporter: Sun Xin


We next need use the procedure mechanism to implement enable/disable/refresh 
peer, and  ReplicationServer also needs reportProcedureDone to master, so I 
hope to move method reportProcedureDone to Master.proto from 
RegionServerStatus.proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying

2021-03-25 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25562.
-
Fix Version/s: 2.4.3
   2.3.5
   3.0.0-alpha-1
   Resolution: Fixed

> ReplicationSourceWALReader log and handle exception immediately without 
> retrying
> 
>
> Key: HBASE-25562
> URL: https://issues.apache.org/jira/browse/HBASE-25562
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.5, 2.4.3
>
>
> In [this piece of code about retrying in 
> ReplicationSourceWALReader#run|https://github.com/apache/hbase/blob/0353909bc268e3ff3def098963d021e973f1f153/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L151],
>  sleep time increases with the number of retries, if an exception happens 
> that cannot be recovered by itself, error logs will appear after 12 hours 
> (300 retries by default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25683) Simplify UTs using DummyServer

2021-03-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-25683:
---

 Summary: Simplify UTs using DummyServer
 Key: HBASE-25683
 URL: https://issues.apache.org/jira/browse/HBASE-25683
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25638) The master local region is constantly major compact

2021-03-05 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25638.
-
Resolution: Not A Problem

> The master local region is constantly major compact
> ---
>
> Key: HBASE-25638
> URL: https://issues.apache.org/jira/browse/HBASE-25638
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
>
> In 
> [MasterRegionFlusherAndCompactor.compact|https://github.com/apache/hbase/blob/830d2895b27fa0cf39a28d3af9673a4126ea8258/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFlusherAndCompactor.java#L164],
>  we call region.compact(true) constantly like recursion. This caused a lot of 
> logs to be flushed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25638) The master local region is constantly major compact

2021-03-05 Thread Sun Xin (Jira)
Sun Xin created HBASE-25638:
---

 Summary: The master local region is constantly major compact
 Key: HBASE-25638
 URL: https://issues.apache.org/jira/browse/HBASE-25638
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.4, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In 
[MasterRegionFlusherAndCompactor.compact|https://github.com/apache/hbase/blob/830d2895b27fa0cf39a28d3af9673a4126ea8258/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFlusherAndCompactor.java#L164],
 we call region.compact(true) constantly like recursion. This caused a lot of 
logs to be flushed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25598) TestFromClientSide5.testScanMetrics is flaky

2021-02-23 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25598.
-
Fix Version/s: 2.4.2
   2.3.5
   2.2.7
   3.0.0-alpha-1
   Resolution: Fixed

Thanks [~zhangduo] for reviewing.

Merged to master and all active branch-2.x.

> TestFromClientSide5.testScanMetrics is flaky
> 
>
> Key: HBASE-25598
> URL: https://issues.apache.org/jira/browse/HBASE-25598
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2
>
>
> In some PRs, I got the following errors in UT results.
> {code:java}
> [ERROR] Errors: 
> [ERROR] org.apache.hadoop.hbase.client.TestFromClientSide5.testScanMetrics[0]
> [ERROR]   Run 1: TestFromClientSide5.testScanMetrics:1018 Did not count the 
> result bytes expected:<60> but was:<120>
> [ERROR]   Run 2: TestFromClientSide5.testScanMetrics:1036 Did not count the 
> result bytes expected:<60> but was:<180>
> [ERROR]   Run 3: TestFromClientSide5.testScanMetrics:951 » 
> MasterRegistryFetch Exception making...
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor5.testScanMetrics[1]
> [ERROR]   Run 1: 
> TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:1036 
> Did not count the result bytes expected:<60> but was:<120>
> [ERROR]   Run 2: 
> TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » 
> IO
> [ERROR]   Run 3: 
> TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » 
> IO
> [INFO] 
> {code}
> I read the code further and found that this UT is flaky.
> {code:java}
> // check byte counters
> scan2 = new Scan();
> scan2.setScanMetricsEnabled(true);
> scan2.setCaching(1);
> try (ResultScanner scanner = ht.getScanner(scan2)) {
>   int numBytes = 0;
>   for (Result result : scanner.next(1)) {
> for (Cell cell : result.listCells()) {
>   numBytes += PrivateCellUtil.estimatedSerializedSizeOf(cell);
> }
>   }
>   scanner.close();
>   ScanMetrics scanMetrics = scanner.getScanMetrics();
>   assertEquals("Did not count the result bytes", numBytes,
>   scanMetrics.countOfBytesInResults.get());
> }
> {code}
> In the code above, it is to check scanMetrics.countOfBytesInResults, but just 
> get only ONE row by scanner.next(1) . A total of 3 rows are inserted into the 
> table, and scanner prefetch from server in advance until maxCacheSize is 
> exceeded, see 
> [here|https://github.com/apache/hbase/blob/5fa15cfde3d77e77ffb1f09d60dce4db264f3831/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java#L94].
> So if scanner prefetch more than one row before closing scanner, the UT 
> fails. we can reproduce this problem steadily by sleeping before 
> scanner.close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25598) TestFromClientSide5.testScanMetrics is flaky

2021-02-23 Thread Sun Xin (Jira)
Sun Xin created HBASE-25598:
---

 Summary: TestFromClientSide5.testScanMetrics is flaky
 Key: HBASE-25598
 URL: https://issues.apache.org/jira/browse/HBASE-25598
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.4, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In some PRs, I got the following errors in UT results.
{code:java}
[ERROR] Errors: 
[ERROR] org.apache.hadoop.hbase.client.TestFromClientSide5.testScanMetrics[0]
[ERROR]   Run 1: TestFromClientSide5.testScanMetrics:1018 Did not count the 
result bytes expected:<60> but was:<120>
[ERROR]   Run 2: TestFromClientSide5.testScanMetrics:1036 Did not count the 
result bytes expected:<60> but was:<180>
[ERROR]   Run 3: TestFromClientSide5.testScanMetrics:951 » MasterRegistryFetch 
Exception making...
[INFO] 
[ERROR] 
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor5.testScanMetrics[1]
[ERROR]   Run 1: 
TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:1036 Did 
not count the result bytes expected:<60> but was:<120>
[ERROR]   Run 2: 
TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » IO
[ERROR]   Run 3: 
TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » IO
[INFO] 
{code}
I read the code further and found that this UT is flaky.
{code:java}
// check byte counters
scan2 = new Scan();
scan2.setScanMetricsEnabled(true);
scan2.setCaching(1);
try (ResultScanner scanner = ht.getScanner(scan2)) {
  int numBytes = 0;
  for (Result result : scanner.next(1)) {
for (Cell cell : result.listCells()) {
  numBytes += PrivateCellUtil.estimatedSerializedSizeOf(cell);
}
  }
  scanner.close();
  ScanMetrics scanMetrics = scanner.getScanMetrics();
  assertEquals("Did not count the result bytes", numBytes,
  scanMetrics.countOfBytesInResults.get());
}
{code}
In the code above, it is to check scanMetrics.countOfBytesInResults, but just 
get only ONE row by scanner.next(1) . A total of 3 rows are inserted into the 
table, and scanner prefetch from server in advance until maxCacheSize is 
exceeded, see 
[here|https://github.com/apache/hbase/blob/5fa15cfde3d77e77ffb1f09d60dce4db264f3831/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java#L94].

So if scanner prefetch more than one row before closing scanner, the UT fails. 
we can reproduce this problem steadily by sleeping before scanner.close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25590) Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs

2021-02-20 Thread Sun Xin (Jira)
Sun Xin created HBASE-25590:
---

 Summary: Bulkload replication HFileRefs cannot be cleared in some 
cases where set exclude-namespace/exclude-table-cfs
 Key: HBASE-25590
 URL: https://issues.apache.org/jira/browse/HBASE-25590
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In 
[ReplicationSource#addHFileRefs|https://github.com/apache/hbase/blob/ed90a14995acd87111d2b9849f07d84418ca43d4/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L264],
 we may add unwanted hfiles to the _HFileRefs_ if a peer is set _replicate_all_ 
true and set _exclude-namespace/exclude-table-cfs_.

These unwanted _HFileRefs_ will not be replicated to remote cluster and not be 
cleared.

Two problems are caused by this bug:
 # The metric sizeOfHFileRefsQueue cannot be zeroed.
 # Referenced HFiles cannot be deleted by _ReplicationHFileCleaner._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25559) Terminate threads of oldsources while RS is closing

2021-02-09 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25559.
-
Fix Version/s: 2.4.2
   2.3.5
   2.2.7
   3.0.0-alpha-1
   Resolution: Fixed

Merged to master and all active branch-2.x.

> Terminate threads of oldsources while RS is closing
> ---
>
> Key: HBASE-25559
> URL: https://issues.apache.org/jira/browse/HBASE-25559
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying

2021-02-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-25562:
---

 Summary: ReplicationSourceWALReader log and handle exception 
immediately without retrying
 Key: HBASE-25562
 URL: https://issues.apache.org/jira/browse/HBASE-25562
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In this piece of code about retrying in ReplicationSourceWALReader#run, sleep 
time increases with the number of retries, if an exception happens that cannot 
be recovered by itself, error logs will appear after 12 hours (300 retries by 
default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25560) Remove unused parameter named peerId in the constructor method of CatalogReplicationSourcePeer

2021-02-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-25560:
---

 Summary: Remove unused parameter named peerId in the constructor 
method of CatalogReplicationSourcePeer
 Key: HBASE-25560
 URL: https://issues.apache.org/jira/browse/HBASE-25560
 Project: HBase
  Issue Type: Bug
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25559) Terminate threads of oldsources while RS is closing

2021-02-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-25559:
---

 Summary: Terminate threads of oldsources while RS is closing
 Key: HBASE-25559
 URL: https://issues.apache.org/jira/browse/HBASE-25559
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-07 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-25553.
-
Resolution: Fixed

> It is better for ReplicationTracker.getListOfRegionServers to return 
> ServerName instead of String
> -
>
> Key: HBASE-25553
> URL: https://issues.apache.org/jira/browse/HBASE-25553
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.3.5, 2.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String

2021-02-04 Thread Sun Xin (Jira)
Sun Xin created HBASE-25553:
---

 Summary: It is better for 
ReplicationTracker.getListOfRegionServers to return ServerName instead of String
 Key: HBASE-25553
 URL: https://issues.apache.org/jira/browse/HBASE-25553
 Project: HBase
  Issue Type: Bug
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25309) Support start/stop replication server by scripts

2020-11-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-25309:
---

 Summary: Support start/stop replication server by scripts
 Key: HBASE-25309
 URL: https://issues.apache.org/jira/browse/HBASE-25309
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25305) Add master UI to show ReplicationServer

2020-11-18 Thread Sun Xin (Jira)
Sun Xin created HBASE-25305:
---

 Summary: Add master UI to show ReplicationServer
 Key: HBASE-25305
 URL: https://issues.apache.org/jira/browse/HBASE-25305
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25300) 'Unknown table hbase:quota' happens when desc table in shell if quota disabled

2020-11-17 Thread Sun Xin (Jira)
Sun Xin created HBASE-25300:
---

 Summary: 'Unknown table hbase:quota' happens when desc table in 
shell if quota disabled
 Key: HBASE-25300
 URL: https://issues.apache.org/jira/browse/HBASE-25300
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25289) [testing] Clean up resources after tests in rsgroup_shell_test.rb

2020-11-16 Thread Sun Xin (Jira)
Sun Xin created HBASE-25289:
---

 Summary: [testing] Clean up resources after tests in 
rsgroup_shell_test.rb
 Key: HBASE-25289
 URL: https://issues.apache.org/jira/browse/HBASE-25289
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup, test
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In rsgroup_shell_test.rb, some tests don't remove rsgroups and drop tables, 
messing up adding new tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25171) Remove ZNodePaths.namespaceZNode

2020-10-10 Thread Sun Xin (Jira)
Sun Xin created HBASE-25171:
---

 Summary: Remove ZNodePaths.namespaceZNode
 Key: HBASE-25171
 URL: https://issues.apache.org/jira/browse/HBASE-25171
 Project: HBase
  Issue Type: Improvement
  Components: Zookeeper
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In HBASE-21154, had removed the dependency on  ZNodePaths.namespaceZNode, so 
remove this field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25117) ReplicationSourceShipper thread can not be finished

2020-09-29 Thread Sun Xin (Jira)
Sun Xin created HBASE-25117:
---

 Summary: ReplicationSourceShipper thread can not be finished
 Key: HBASE-25117
 URL: https://issues.apache.org/jira/browse/HBASE-25117
 Project: HBase
  Issue Type: Bug
Reporter: Sun Xin
Assignee: Sun Xin


See [Flaky 
Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console],
 some UTs about replication failed cause timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25113) [testing] HBaseCluster support ReplicationServer for UTs

2020-09-28 Thread Sun Xin (Jira)
Sun Xin created HBASE-25113:
---

 Summary: [testing] HBaseCluster support ReplicationServer for UTs
 Key: HBASE-25113
 URL: https://issues.apache.org/jira/browse/HBASE-25113
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25100) conn is assigned twice in HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint

2020-09-26 Thread Sun Xin (Jira)
Sun Xin created HBASE-25100:
---

 Summary: conn is assigned twice in HBaseReplicationEndpoint and 
HBaseInterClusterReplicationEndpoint
 Key: HBASE-25100
 URL: https://issues.apache.org/jira/browse/HBASE-25100
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In 
[HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109]
 and  
[HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145]
 , the latter is a sub-class of the former, conn is assigned twice.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25098) ReplicationStatisticsChore runs in wrong time unit

2020-09-25 Thread Sun Xin (Jira)
Sun Xin created HBASE-25098:
---

 Summary: ReplicationStatisticsChore runs in wrong time unit
 Key: HBASE-25098
 URL: https://issues.apache.org/jira/browse/HBASE-25098
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25014) ScheduledChore is never triggered when initalDelay > 1.5*period

2020-09-11 Thread Sun Xin (Jira)
Sun Xin created HBASE-25014:
---

 Summary: ScheduledChore is never triggered when initalDelay > 
1.5*period
 Key: HBASE-25014
 URL: https://issues.apache.org/jira/browse/HBASE-25014
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.5, 2.2.4, 2.2.3, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In our recent tests, ScheduledChore is never triggered when initalDelay > 
1.5*period.

The cause of the bug is the following:

The trigger time for a ScheduleChore must be within an acceptable time window 
that is 1.5 * period. see 
[here|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L234]

timeOfLastRun and timeOfThisRun are two variables that record two adjacent 
trigger time. [The first initialization of 
timeOfThisRun|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L273]
 is when the ScheduleChore is created, it's not a real trigger time.

If we set initialDelay > 1.5 period , after initialDelay, the first time when 
chore is triggered has exceeded the allowed window. Then [cancel the chore and 
schedule it 
again|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java#L176].

So it's stuck in loop when initialDelay > 1.5 period :

1.  init timeOfThisRun at a wrong time.

2. wait initalDelay

3. chore trigger, but exceeded the allowed window.

4. cancel chore and schedule it again

5. go step 1.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException

2020-09-11 Thread Sun Xin (Jira)
Sun Xin created HBASE-25012:
---

 Summary: HBASE-24359 causes replication missed log of some 
RemoteException
 Key: HBASE-25012
 URL: https://issues.apache.org/jira/browse/HBASE-25012
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.3.1, 2.3.0, 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


[HBASE-24359|https://issues.apache.org/jira/browse/HBASE-24359] broken the 
logic of handling exception. In branch2, it even causes some RemoteException 
log missed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24999) Master manages ReplicationServers

2020-09-08 Thread Sun Xin (Jira)
Sun Xin created HBASE-24999:
---

 Summary: Master manages ReplicationServers
 Key: HBASE-24999
 URL: https://issues.apache.org/jira/browse/HBASE-24999
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin


In [HBASE-24683|https://issues.apache.org/jira/browse/HBASE-24683] add an 
isolated ReplicationServer.

What this issue is to do: 
 # ReplicationServer reports to Master periodically.
 # Add a basic ReplicationServerManager in Master to manage ReplicationServer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24982) Disassemble the method replicateWALEntry from AdminService to a new interface ReplicationSinkService

2020-09-04 Thread Sun Xin (Jira)
Sun Xin created HBASE-24982:
---

 Summary: Disassemble the method replicateWALEntry from 
AdminService to a new interface ReplicationSinkService
 Key: HBASE-24982
 URL: https://issues.apache.org/jira/browse/HBASE-24982
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Sun Xin
Assignee: Sun Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24683) Add a basic ReplicationServer which only implement ReplicationSink Service

2020-09-04 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-24683.
-
Resolution: Resolved

> Add a basic ReplicationServer which only implement ReplicationSink Service
> --
>
> Key: HBASE-24683
> URL: https://issues.apache.org/jira/browse/HBASE-24683
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: Sun Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24914) Reomve duplicate code appearing continuously in method ReplicationPeerManager.updatePeerConfig

2020-08-20 Thread Sun Xin (Jira)
Sun Xin created HBASE-24914:
---

 Summary: Reomve duplicate code appearing continuously in method 
ReplicationPeerManager.updatePeerConfig
 Key: HBASE-24914
 URL: https://issues.apache.org/jira/browse/HBASE-24914
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In 
[ReplicationPeerManager.updatePeerConfig|https://github.com/apache/hbase/blob/1164531d5ab519ab58af82ba3849f8fcded3453f/hbase-server/src/main/java/org/apache/hadoop/hbase/master/replication/ReplicationPeerManager.java#L272],
 I found duplicate code appearing twice continuously, so remove once.
{code:java}
newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration());
newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration());
newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration());
newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration());
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24913) Refactor TestJMXConnectorServer

2020-08-20 Thread Sun Xin (Jira)
Sun Xin created HBASE-24913:
---

 Summary: Refactor TestJMXConnectorServer
 Key: HBASE-24913
 URL: https://issues.apache.org/jira/browse/HBASE-24913
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


Two optimization points for TestJMXConnectorServer in this issue:
 # Just run cluster once, not once per test case.
 # Use random free port to run ConnectorServer, avoid specifying a fixed port.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24797) Move log code out of loop

2020-07-30 Thread Sun Xin (Jira)
Sun Xin created HBASE-24797:
---

 Summary: Move log code out of loop
 Key: HBASE-24797
 URL: https://issues.apache.org/jira/browse/HBASE-24797
 Project: HBase
  Issue Type: Bug
  Components: Normalizer
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In HMaster#normalizeRegions, maybe we shoule move the log code about  
submittedPlanProcIds out of loop.

 
{code:java}
public boolean normalizeRegions() throws IOException {
  ...
final List submittedPlanProcIds = new ArrayList<>();
for (TableName table : allEnabledTables) {
  ...
  for (NormalizationPlan plan : plans) {
long procId = plan.submit(this);
submittedPlanProcIds.add(procId);
...
  }
  int totalPlansSubmitted = submittedPlanProcIds.size();
  if (totalPlansSubmitted > 0 && LOG.isDebugEnabled()) {
LOG.debug("Normalizer plans submitted. Total plans count: {} , procID 
list: {}",
  totalPlansSubmitted, submittedPlanProcIds);
  }
}
  ...
}
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24769) Auto scale RSGroup

2020-07-24 Thread Sun Xin (Jira)
Sun Xin created HBASE-24769:
---

 Summary: Auto scale RSGroup
 Key: HBASE-24769
 URL: https://issues.apache.org/jira/browse/HBASE-24769
 Project: HBase
  Issue Type: New Feature
  Components: rsgroup
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In current use, if RSs go offline or online, we must manually move RSs in or 
out RSGroups.

Now we can configure how many servers rsgroups need base on HBASE-24431 , and 
then add an AutoScaleChore to periodically check and move servers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24760) Allow system tables fallback to any rs groups

2020-07-23 Thread Sun Xin (Jira)
Sun Xin created HBASE-24760:
---

 Summary: Allow system tables fallback to any rs groups
 Key: HBASE-24760
 URL: https://issues.apache.org/jira/browse/HBASE-24760
 Project: HBase
  Issue Type: New Feature
  Components: rsgroup
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In HBASE-22738 we allow tables fallback to specific rs groups, If there is no 
online servers in the table's rsgroup.

But for system tables, It is necessary to allow system tables fallback to any 
rsgroup in order to keey available at all times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24759) Persisting configuration of default rsgroup

2020-07-23 Thread Sun Xin (Jira)
Sun Xin created HBASE-24759:
---

 Summary: Persisting configuration of default rsgroup
 Key: HBASE-24759
 URL: https://issues.apache.org/jira/browse/HBASE-24759
 Project: HBase
  Issue Type: New Feature
  Components: rsgroup
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In the current scenario, we didn't store the default rsgroup information. But 
after HBASE-24431 , we have added a config map, which need to be persisted to 
avoid lossing config of default rsgroup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24654) Allow unset table's rsgroup

2020-06-29 Thread Sun Xin (Jira)
Sun Xin created HBASE-24654:
---

 Summary: Allow unset table's rsgroup
 Key: HBASE-24654
 URL: https://issues.apache.org/jira/browse/HBASE-24654
 Project: HBase
  Issue Type: New Feature
  Components: rsgroup
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In TableDescriptorBuilder, we have only one method to set rsgroup, but have no 
one to unset it. this unset method is necessary In some cases.

If the table had rsgroup config before, but now I want to use the namespace 
config. It doesn't work that I set table rsgroup config to default rsgroup, 
must remove rsgroup config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24591) get_table_rsgroup ignored the existence of rsgroup config for namespace

2020-06-18 Thread Sun Xin (Jira)
Sun Xin created HBASE-24591:
---

 Summary: get_table_rsgroup ignored the existence of rsgroup config 
for namespace
 Key: HBASE-24591
 URL: https://issues.apache.org/jira/browse/HBASE-24591
 Project: HBase
  Issue Type: Bug
  Components: rsgroup
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


{code:java}
public GetRSGroupInfoOfTableResponse getRSGroupInfoOfTable(RpcController 
controller,
  GetRSGroupInfoOfTableRequest request) throws ServiceException {
  TableName tableName = ProtobufUtil.toTableName(request.getTableName());
  ...
  try {
...
GetRSGroupInfoOfTableResponse resp;
TableDescriptor td = master.getTableDescriptors().get(tableName);
if (td == null) {
  resp = GetRSGroupInfoOfTableResponse.getDefaultInstance();
} else {
  RSGroupInfo rsGroupInfo = null;
  if (td.getRegionServerGroup().isPresent()) {
rsGroupInfo = 
master.getRSGroupInfoManager().getRSGroup(td.getRegionServerGroup().get());
  }
  if (rsGroupInfo == null) {
rsGroupInfo = 
master.getRSGroupInfoManager().getRSGroup(RSGroupInfo.DEFAULT_GROUP);
  }
  resp = GetRSGroupInfoOfTableResponse.newBuilder()
.setRSGroupInfo(ProtobufUtil.toProtoGroupInfo(rsGroupInfo)).build();
}
...
return resp;
  } catch (IOException e) {
throw new ServiceException(e);
  }
}

{code}
In method MasterRpcServices#getRSGroupInfoOfTable, ignored namespace 
hbase.rsgroup.name config. 
 
It should be replaced by RSGroupUtil#getRSGroupInfo. 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24431) RSGroupInfo add configuration map to store something extra

2020-05-25 Thread Sun Xin (Jira)
Sun Xin created HBASE-24431:
---

 Summary: RSGroupInfo add configuration map to store something extra
 Key: HBASE-24431
 URL: https://issues.apache.org/jira/browse/HBASE-24431
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


Maybe we should add a _Map configuration_ into RSGroupInfo to 
store extra infomation.

For example, we can store the minimum number of machines the group needs, in 
order to move machine into this group automatically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24416) RegionNormalizer spliting region should not be limited by hbase.normalizer.min.region.count

2020-05-21 Thread Sun Xin (Jira)
Sun Xin created HBASE-24416:
---

 Summary: RegionNormalizer spliting region should not be limited by 
hbase.normalizer.min.region.count
 Key: HBASE-24416
 URL: https://issues.apache.org/jira/browse/HBASE-24416
 Project: HBase
  Issue Type: Improvement
Affects Versions: 3.0.0-alpha-1
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


In method computePlanForTable of SimpleRegionNormalizer: 

we will skip spliting region if the number of regions in the table is less than 
hbase.normalizer.min.region.count, even if there is a huge region in the table.
{code:java}
...
if (tableRegions == null || tableRegions.size() < minRegionCount) {
  ...
  return null;
}

...
// get region split plan
if (splitEnabled) {
  List splitPlans = getSplitNormalizationPlan(table);
  if (splitPlans != null) {
plans.addAll(splitPlans);
  }
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24399) [Flakey Tests] Some UTs about RSGroup should wait RSGroupInfoManager to be online

2020-05-19 Thread Sun Xin (Jira)
Sun Xin created HBASE-24399:
---

 Summary: [Flakey Tests] Some UTs about RSGroup should wait 
RSGroupInfoManager to be online
 Key: HBASE-24399
 URL: https://issues.apache.org/jira/browse/HBASE-24399
 Project: HBase
  Issue Type: Improvement
  Components: rsgroup
Affects Versions: 2.3.0
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 2.3.0


We will access table hbase:rsgroup when call addRSGroup, so we should ensure 
RSGroupInfoManagerImpl is online before testing in the UTs about RSGroup.

Otherwise, the following exceptions may be saw.
{code:java}
java.io.IOException: java.io.IOException: Only servers in default group can be 
updated during offline modejava.io.IOException: java.io.IOException: Only 
servers in default group can be updated during offline mode at 
org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.flushConfig(RSGroupInfoManagerImpl.java:602)
 at 
org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.addRSGroup(RSGroupInfoManagerImpl.java:217)
 at 
org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.addRSGroup(RSGroupAdminServer.java:391)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24359) Optionally ignore edits for deleted CFs for replication.

2020-05-12 Thread Sun Xin (Jira)
Sun Xin created HBASE-24359:
---

 Summary: Optionally ignore edits for deleted CFs for replication.
 Key: HBASE-24359
 URL: https://issues.apache.org/jira/browse/HBASE-24359
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Affects Versions: 2.2.4
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0-alpha-1


Replication will be stuck after we delete CFs from both the source and the 
sink, if the source still has outstanding edits that now it could not get rid 
of. Now all replication is backed up behind these unreplicatable edits.
We should have an option to ignore edits for deleted CFs at the source.

This issue is similar to 
[HBASE-12091|https://issues.apache.org/jira/browse/HBASE-12091]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24166) Duplicate implementation for acquireLock between CreateTableProcedure and its parent class

2020-04-10 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-24166.
-
Resolution: Duplicate

> Duplicate implementation for acquireLock between CreateTableProcedure and its 
> parent class
> --
>
> Key: HBASE-24166
> URL: https://issues.apache.org/jira/browse/HBASE-24166
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Affects Versions: 3.0.0, 2.2.4
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
> Fix For: 3.0.0
>
>
> The override method _acquireLock_ in _CreateTableProcedure_ and 
> _InitMetaProcedure_ is the same as the implementation in its parent class 
> _AbstractStateMachineTableProcedure_. So delete the override method in 
> subclass.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24166) Duplicate implementation for acquireLock between CreateTableProcedure and its parent class

2020-04-10 Thread Sun Xin (Jira)
Sun Xin created HBASE-24166:
---

 Summary: Duplicate implementation for acquireLock between 
CreateTableProcedure and its parent class
 Key: HBASE-24166
 URL: https://issues.apache.org/jira/browse/HBASE-24166
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Affects Versions: 2.2.4, 3.0.0
Reporter: Sun Xin
Assignee: Sun Xin
 Fix For: 3.0.0


The override method _acquireLock_ in _CreateTableProcedure_ and 
_InitMetaProcedure_ is the same as the implementation in its parent class 
_AbstractStateMachineTableProcedure_. So delete the override method in subclass.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23376) NPE happens while replica region is moving

2019-12-08 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin reopened HBASE-23376:
-

> NPE happens while replica region is moving
> --
>
> Key: HBASE-23376
> URL: https://issues.apache.org/jira/browse/HBASE-23376
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
> Attachments: HBASE-23376.branch-2.001.patch
>
>
> The following code is from AsyncNonMetaRegionLocator#addToCache
> {code:java}
> private RegionLocations addToCache(TableCache tableCache, RegionLocations 
> locs) {
>   LOG.trace("Try adding {} to cache", locs);
>   byte[] startKey = locs.getDefaultRegionLocation().getRegion().getStartKey();
>   ...
> }{code}
>  we will get a NPE if the locs is without the default region.
>  
> The following code is from 
> AsyncRegionLocatorHelper#updateCachedLocationOnError 
> {code:java}
> ...
> if (cause instanceof RegionMovedException) {
>   RegionMovedException rme = (RegionMovedException) cause;
>   HRegionLocation newLoc =
> new HRegionLocation(loc.getRegion(), rme.getServerName(), 
> rme.getLocationSeqNum());
>   LOG.debug("Try updating {} with the new location {} constructed by {}", 
> loc, newLoc,
> rme.toString());
>   addToCache.accept(newLoc);
> ...{code}
> If the replica region is moving, we will get a RegionMovedException and add 
> the HRegionLocation of replica region to cache. And finally NPE happens.
>   
> {code:java}
> java.lang.NullPointerExceptionjava.lang.NullPointerException at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addToCache(AsyncNonMetaRegionLocator.java:240)
>  at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addLocationToCache(AsyncNonMetaRegionLocator.java:596)
>  at 
> org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:80)
>  at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610)
>  at 
> org.apache.hadoop.hbase.client.AsyncRegionLocator.updateCachedLocationOnError(AsyncRegionLocator.java:153)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23376) NPE happens while replica region is moving

2019-12-08 Thread Sun Xin (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Xin resolved HBASE-23376.
-
Resolution: Fixed

> NPE happens while replica region is moving
> --
>
> Key: HBASE-23376
> URL: https://issues.apache.org/jira/browse/HBASE-23376
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Reporter: Sun Xin
>Assignee: Sun Xin
>Priority: Minor
> Attachments: HBASE-23376.branch-2.001.patch
>
>
> The following code is from AsyncNonMetaRegionLocator#addToCache
> {code:java}
> private RegionLocations addToCache(TableCache tableCache, RegionLocations 
> locs) {
>   LOG.trace("Try adding {} to cache", locs);
>   byte[] startKey = locs.getDefaultRegionLocation().getRegion().getStartKey();
>   ...
> }{code}
>  we will get a NPE if the locs is without the default region.
>  
> The following code is from 
> AsyncRegionLocatorHelper#updateCachedLocationOnError 
> {code:java}
> ...
> if (cause instanceof RegionMovedException) {
>   RegionMovedException rme = (RegionMovedException) cause;
>   HRegionLocation newLoc =
> new HRegionLocation(loc.getRegion(), rme.getServerName(), 
> rme.getLocationSeqNum());
>   LOG.debug("Try updating {} with the new location {} constructed by {}", 
> loc, newLoc,
> rme.toString());
>   addToCache.accept(newLoc);
> ...{code}
> If the replica region is moving, we will get a RegionMovedException and add 
> the HRegionLocation of replica region to cache. And finally NPE happens.
>   
> {code:java}
> java.lang.NullPointerExceptionjava.lang.NullPointerException at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addToCache(AsyncNonMetaRegionLocator.java:240)
>  at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addLocationToCache(AsyncNonMetaRegionLocator.java:596)
>  at 
> org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:80)
>  at 
> org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610)
>  at 
> org.apache.hadoop.hbase.client.AsyncRegionLocator.updateCachedLocationOnError(AsyncRegionLocator.java:153)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23376) NPE happens while replica region is moving

2019-12-05 Thread Sun Xin (Jira)
Sun Xin created HBASE-23376:
---

 Summary: NPE happens while replica region is moving
 Key: HBASE-23376
 URL: https://issues.apache.org/jira/browse/HBASE-23376
 Project: HBase
  Issue Type: Bug
  Components: read replicas
Reporter: Sun Xin
Assignee: Sun Xin


The following code is from AsyncNonMetaRegionLocator#addToCache

 
{code:java}
private RegionLocations addToCache(TableCache tableCache, RegionLocations locs) 
{
  LOG.trace("Try adding {} to cache", locs);
  byte[] startKey = locs.getDefaultRegionLocation().getRegion().getStartKey();
  ...
}{code}
 

we will get a NPE if the locs is without the default region.

 

The following code is from AsyncRegionLocatorHelper#updateCachedLocationOnError

 
{code:java}
...
if (cause instanceof RegionMovedException) {
  RegionMovedException rme = (RegionMovedException) cause;
  HRegionLocation newLoc =
new HRegionLocation(loc.getRegion(), rme.getServerName(), 
rme.getLocationSeqNum());
  LOG.debug("Try updating {} with the new location {} constructed by {}", loc, 
newLoc,
rme.toString());
  addToCache.accept(newLoc);
...{code}
If the replica region is moving, we will get a RegionMovedException and add the 
HRegionLocation of replica region to cache. And finally NPE happens.

 

 
{code:java}
java.lang.NullPointerExceptionjava.lang.NullPointerException at 
org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addToCache(AsyncNonMetaRegionLocator.java:240)
 at 
org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addLocationToCache(AsyncNonMetaRegionLocator.java:596)
 at 
org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:80)
 at 
org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610)
 at 
org.apache.hadoop.hbase.client.AsyncRegionLocator.updateCachedLocationOnError(AsyncRegionLocator.java:153)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23345) Table need to replication unless all of cfs are excluded

2019-11-26 Thread Sun Xin (Jira)
Sun Xin created HBASE-23345:
---

 Summary: Table need to replication unless all of cfs are excluded
 Key: HBASE-23345
 URL: https://issues.apache.org/jira/browse/HBASE-23345
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Sun Xin


ReplicationPeerConfig.needToReplicate return false, when replicateAllUserTables 
is true and excludeTableCFsMap contains part of cfs.

Should judge by whether all of cfs are excluded.
{code:java}
public boolean needToReplicate(TableName table) {
  if (replicateAllUserTables) {
if (excludeNamespaces != null && 
excludeNamespaces.contains(table.getNamespaceAsString())) {
  return false;
}
if (excludeTableCFsMap != null && excludeTableCFsMap.containsKey(table)) {
  return false;
}
return true;
  } else {
if (namespaces != null && 
namespaces.contains(table.getNamespaceAsString())) {
  return true;
}
if (tableCFsMap != null && tableCFsMap.containsKey(table)) {
  return true;
}
return false;
  }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)