[jira] [Resolved] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2
[ https://issues.apache.org/jira/browse/HBASE-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-28330. - Fix Version/s: 2.6.0 2.5.8 Resolution: Fixed Pushed to branch-2, branch-2.5, branch-2.6. Thanks for the review [~zhangduo] > TestUnknownServers.testListUnknownServers is flaky in branch-2 > -- > > Key: HBASE-28330 > URL: https://issues.apache.org/jira/browse/HBASE-28330 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.5.7 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 2.6.0, 2.5.8 > > > {code:java} > [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 > s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers > [ERROR] > org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers > Time elapsed: 0.204 s <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<2> {code} > The value of TestUnknownServers.SLAVES is different between > [branch-2|https://github.com/apache/hbase/blob/68bc533f7116cedc681704b82319e5793b827621/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L44] > and > [master|https://github.com/apache/hbase/blob/b87b05c847f00c292664d894c21f83c73d48460d/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestUnknownServers.java#L43]. > It is 1 in master but 2 in branch-2. > The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is > not tracked by the ServerManager. > Please see HMaster.getUnknownServers > {code:java} > private List getUnknownServers() { > if (serverManager != null) { > final Set serverNames = > getAssignmentManager().getRegionStates().getRegionStates() > .stream().map(RegionState::getServerName).collect(Collectors.toSet()); > final List unknownServerNames = serverNames.stream() > .filter(sn -> sn != null && > serverManager.isServerUnknown(sn)).collect(Collectors.toList()); > return unknownServerNames; > } > return null; > } {code} > In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster > with 2 RegionServer, if all region are assigned to ONE server, then only that > server is called UNKNOWN_SERVER, the UT will fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28330) TestUnknownServers.testListUnknownServers is flaky in branch-2
Sun Xin created HBASE-28330: --- Summary: TestUnknownServers.testListUnknownServers is flaky in branch-2 Key: HBASE-28330 URL: https://issues.apache.org/jira/browse/HBASE-28330 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.5.7 Reporter: Sun Xin Assignee: Sun Xin {code:java} [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.913 s <<< FAILURE! - in org.apache.hadoop.hbase.master.TestUnknownServers [ERROR] org.apache.hadoop.hbase.master.TestUnknownServers.testListUnknownServers Time elapsed: 0.204 s <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<2> {code} The value of TestUnknownServers.SLAVES is different between branch-2 and master. It is 1 in master but 2 in branch-2. The RegionServer marked UNKNOWN_SERVER is the one that *holds regions* but is not tracked by the ServerManager. Please see HMaster.getUnknownServers {code:java} private List getUnknownServers() { if (serverManager != null) { final Set serverNames = getAssignmentManager().getRegionStates().getRegionStates() .stream().map(RegionState::getServerName).collect(Collectors.toSet()); final List unknownServerNames = serverNames.stream() .filter(sn -> sn != null && serverManager.isServerUnknown(sn)).collect(Collectors.toList()); return unknownServerNames; } return null; } {code} In UT TestUnknownServers.testListUnknownServers, we start a HBase cluster with 2 RegionServer, if all region are assigned to ONE server, then only that server is called UNKNOWN_SERVER, the UT will fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28324) TestRegionNormalizerWorkQueue#testTake is flaky
[ https://issues.apache.org/jira/browse/HBASE-28324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-28324. - Fix Version/s: 2.6.0 2.4.18 2.5.8 3.0.0-beta-2 Resolution: Fixed Pushed to all active branches. Thanks for the review [~zhangduo] > TestRegionNormalizerWorkQueue#testTake is flaky > --- > > Key: HBASE-28324 > URL: https://issues.apache.org/jira/browse/HBASE-28324 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-beta-1, 2.5.7 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 2.6.0, 2.4.18, 2.5.8, 3.0.0-beta-2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28324) TestRegionNormalizerWorkQueue#testTake is flaky
Sun Xin created HBASE-28324: --- Summary: TestRegionNormalizerWorkQueue#testTake is flaky Key: HBASE-28324 URL: https://issues.apache.org/jira/browse/HBASE-28324 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.5.7, 3.0.0-beta-1 Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table
[ https://issues.apache.org/jira/browse/HBASE-27469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-27469. - Fix Version/s: 2.5.2 2.4.16 (was: 2.6.0) Resolution: Fixed > IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when > dropping a table > > > Key: HBASE-27469 > URL: https://issues.apache.org/jira/browse/HBASE-27469 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 3.0.0-alpha-3, 2.5.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-4, 2.5.2, 2.4.16 > > > If enabled the feature about scan snapshot and grant the permissions of a > table and a namespace to the same user, an IllegalArgumentException will be > thrown when droping tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27476) Recovered replication may be blocked if enabled hbase.separate.oldlogdir.by.regionserver
Sun Xin created HBASE-27476: --- Summary: Recovered replication may be blocked if enabled hbase.separate.oldlogdir.by.regionserver Key: HBASE-27476 URL: https://issues.apache.org/jira/browse/HBASE-27476 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.4.15, 3.0.0-alpha-3 Reporter: Sun Xin Assignee: Sun Xin In other PR, I got a failed UT {code:java} [ERROR] Failures: [ERROR] org.apache.hadoop.hbase.replication.TestReplicationKillMasterRSWithSeparateOldWALs.killOneMasterRS [ERROR] Run 1: TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84 Waited too much time for queueFailover replication. Waited 61065ms. [ERROR] Run 2: TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84 Waited too much time for queueFailover replication. Waited 58864ms. [ERROR] Run 3: TestReplicationKillMasterRSWithSeparateOldWALs>TestReplicationKillMasterRS.killOneMasterRS:47->TestReplicationKillRS.loadTableAndKillRS:84 Waited too much time for queueFailover replication. Waited 57103ms. {code} This should be caused by a bug. If enabled {_}hbase.separate.oldlogdir.by.regionserver{_}, old wals will be moved into different dir by regionserver name like root/oldWALs/server1/wal1 . For recovered replication, can't convert wal path(like root/oldWALs/wal1) into such paths, and throws FileNotFoundException. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27469) IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table
Sun Xin created HBASE-27469: --- Summary: IllegalArgumentException is thrown by SnapshotScannerHDFSAclController when dropping a table Key: HBASE-27469 URL: https://issues.apache.org/jira/browse/HBASE-27469 Project: HBase Issue Type: Bug Components: snapshots Affects Versions: 2.5.1, 3.0.0-alpha-3 Reporter: Sun Xin Assignee: Sun Xin Fix For: 2.6.0, 3.0.0-alpha-4 If enabled the feature about scan snapshot and grant the permissions of a table and a namespace to the same user, an IllegalArgumentException will be thrown when droping tables. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27354) EOF thrown by WALEntryStream causes replication blocking
Sun Xin created HBASE-27354: --- Summary: EOF thrown by WALEntryStream causes replication blocking Key: HBASE-27354 URL: https://issues.apache.org/jira/browse/HBASE-27354 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.4.14, 3.0.0-alpha-3, 2.5.0, 2.6.0 Reporter: Sun Xin Assignee: Sun Xin In [WALEntryStream#readNextEntryAndRecordReaderPosition|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L257], it is possible that we read uncommitted data. If we read beyond the committed file length, then reopen the inputStream and seek back. In our use, we found that the position where seek back may be exactly the length of the file being written, which may cause EOF. The thrown EOF is finally caught [ReplicationSourceWALReader.run|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L158], but [totalBufferUsed|https://github.com/apache/hbase/blob/308cd729d23329e6d8d4b9c17a645180374b5962/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L78] is not cleanup up. After a long run, all peers will go slow and eventually block completely. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-26956) ExportSnapshot tool supports removing TTL
[ https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-26956. - Fix Version/s: 2.5.0 2.6.0 Resolution: Done Pushed to branch-2 and branch-2.5 > ExportSnapshot tool supports removing TTL > - > > Key: HBASE-26956 > URL: https://issues.apache.org/jira/browse/HBASE-26956 > Project: HBase > Issue Type: New Feature > Components: snapshots >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3 > > > In our scenario, we use ExportSnapshot to copy snapshots to cold storage like > S3. But when we restored back to HBase cluster, it will be deleted directly > because TTL is set. > So we need ExportSnapshot tool support removing TTL. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Reopened] (HBASE-26956) ExportSnapshot tool supports removing TTL
[ https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin reopened HBASE-26956: - Will close this issue after porting to branch-2.x. > ExportSnapshot tool supports removing TTL > - > > Key: HBASE-26956 > URL: https://issues.apache.org/jira/browse/HBASE-26956 > Project: HBase > Issue Type: New Feature >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-3 > > > In our scenario, we use ExportSnapshot to copy snapshots to cold storage like > S3. But when we restored back to HBase cluster, it will be deleted directly > because TTL is set. > So we need ExportSnapshot tool support removing TTL. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (HBASE-26956) ExportSnapshot tool supports removing TTL
[ https://issues.apache.org/jira/browse/HBASE-26956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-26956. - Fix Version/s: 3.0.0-alpha-4 Release Note: ExportSnapshot tool support removing TTL of snapshot. If we use the ExportSnapshot tool to recover snapshot with TTL from cold storage to hbase cluster, we can set `-reset-ttl` to prevent snapshot from being deleted immediately. Resolution: Done Thanks for the review.[~zhangduo] > ExportSnapshot tool supports removing TTL > - > > Key: HBASE-26956 > URL: https://issues.apache.org/jira/browse/HBASE-26956 > Project: HBase > Issue Type: New Feature >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-4 > > > In our scenario, we use ExportSnapshot to copy snapshots to cold storage like > S3. But when we restored back to HBase cluster, it will be deleted directly > because TTL is set. > So we need ExportSnapshot tool support removing TTL. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-26956) ExportSnapshot tool supports removing TTL
Sun Xin created HBASE-26956: --- Summary: ExportSnapshot tool supports removing TTL Key: HBASE-26956 URL: https://issues.apache.org/jira/browse/HBASE-26956 Project: HBase Issue Type: New Feature Reporter: Sun Xin Assignee: Sun Xin In our scenario, we use ExportSnapshot to copy snapshots to cold storage like S3. But when we restored back to HBase cluster, it will be deleted directly because TTL is set. So we need ExportSnapshot tool support removing TTL. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26406) Can not add peer replicating to non-HBase
[ https://issues.apache.org/jira/browse/HBASE-26406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-26406. - Fix Version/s: 2.4.9 3.0.0-alpha-2 Resolution: Fixed Pushed to master and 2.x branchs. Thank all for reviewing. > Can not add peer replicating to non-HBase > - > > Key: HBASE-26406 > URL: https://issues.apache.org/jira/browse/HBASE-26406 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0-alpha-1, 2.4.0 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-2, 2.4.9 > > > Failed to add a peer replicating to non-HBase(like MQ) by implementing custom > ReplicationEndpoint, got exception like this in my UT: > {code:java} > 2021-10-29T15:14:47,632 INFO [RPCClient-NioEventLoopGroup-5-3] > client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: > ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not > replicate to itself for > HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO > [RPCClient-NioEventLoopGroup-5-3] > client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: > ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not > replicate to itself for HBaseInterClusterReplicationEndpoint > org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should > not replicate to itself for HBaseInterClusterReplicationEndpoint > at java.lang.Thread.getStackTrace(Thread.java:1559) at > org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) > at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at > org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at > org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at > org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at > org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) at Future.get(Unknown > Source) at > org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527) > at > org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367) > at > org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123) > at > org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101) > at > org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162) > at >
[jira] [Created] (HBASE-26406) Can not add peer replicating to non-HBase
Sun Xin created HBASE-26406: --- Summary: Can not add peer replicating to non-HBase Key: HBASE-26406 URL: https://issues.apache.org/jira/browse/HBASE-26406 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.4.0, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Failed to add a peer replicating to non-HBase(like MQ) by implementing custom ReplicationEndpoint, got exception like this in my UT: {code:java} 2021-10-29T15:14:47,632 INFO [RPCClient-NioEventLoopGroup-5-3] client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not replicate to itself for HBaseInterClusterReplicationEndpoint2021-10-29T15:14:47,632 INFO [RPCClient-NioEventLoopGroup-5-3] client.RawAsyncHBaseAdmin$ReplicationProcedureBiConsumer(2761): Operation: ADD_REPLICATION_PEER, peerId: 1 failed with Invalid cluster key: , should not replicate to itself for HBaseInterClusterReplicationEndpoint org.apache.hadoop.hbase.DoNotRetryIOException: Invalid cluster key: , should not replicate to itself for HBaseInterClusterReplicationEndpoint at java.lang.Thread.getStackTrace(Thread.java:1559) at org.apache.hadoop.hbase.util.FutureUtils.setStackTrace(FutureUtils.java:130) at org.apache.hadoop.hbase.util.FutureUtils.rethrow(FutureUtils.java:149) at org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:186) at org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1948) at org.apache.hadoop.hbase.client.Admin.addReplicationPeer(Admin.java:1936) at org.apache.hadoop.hbase.replication.TestNonHBaseReplicationEndpoint.test(TestNonHBaseReplicationEndpoint.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) at Future.get(Unknown Source) at org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkClusterId(ReplicationPeerManager.java:527) at org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.checkPeerConfig(ReplicationPeerManager.java:367) at org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.preAddPeer(ReplicationPeerManager.java:123) at org.apache.hadoop.hbase.master.replication.AddPeerProcedure.prePeerModification(AddPeerProcedure.java:101) at org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:162) at org.apache.hadoop.hbase.master.replication.ModifyPeerProcedure.executeFromState(ModifyPeerProcedure.java:43) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:190) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1667) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1414) at
[jira] [Resolved] (HBASE-25773) TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky
[ https://issues.apache.org/jira/browse/HBASE-25773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25773. - Resolution: Fixed Pushed to branch-2 and master, thanks [~zhangduo] for reviewing. > TestSnapshotScannerHDFSAclController.setupBeforeClass is flaky > -- > > Key: HBASE-25773 > URL: https://issues.apache.org/jira/browse/HBASE-25773 > Project: HBase > Issue Type: Improvement >Reporter: Xiaolin Ha >Assignee: Sun Xin >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2 > > > [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3140/2/testReport/org.apache.hadoop.hbase.security.access/TestSnapshotScannerHDFSAclController/precommit_checks___yetus_jdk8_Hadoop3_checks__/] > SnapshotScannerHDFSAclController.postStartMaster alters hbase:acl to add a > new cf "m", but > `TestSnapshotScannerHDFSAclController.setupBeforeClass(TestSnapshotScannerHDFSAclController.java:101)` > fails before the disable and enable hbase:acl complete. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26194) Introduce a ReplicationServerSourceManager to simplify HReplicationServer
[ https://issues.apache.org/jira/browse/HBASE-26194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-26194. - Resolution: Done Merged. Thank [~stack] for reviewing. > Introduce a ReplicationServerSourceManager to simplify HReplicationServer > - > > Key: HBASE-26194 > URL: https://issues.apache.org/jira/browse/HBASE-26194 > Project: HBase > Issue Type: Sub-task > Components: Replication >Affects Versions: 3.0.0-alpha-2 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-2 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26194) Introduce a ReplicationServerSourceManager to simplify HReplicationServer
Sun Xin created HBASE-26194: --- Summary: Introduce a ReplicationServerSourceManager to simplify HReplicationServer Key: HBASE-26194 URL: https://issues.apache.org/jira/browse/HBASE-26194 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 3.0.0-alpha-2 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26084) Add owner of replication queue for ReplicationQueueInfo
[ https://issues.apache.org/jira/browse/HBASE-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-26084. - Fix Version/s: 3.0.0-alpha-2 Resolution: Done Merged. Thank [~stack] [~zhangduo] for reviewing. > Add owner of replication queue for ReplicationQueueInfo > --- > > Key: HBASE-26084 > URL: https://issues.apache.org/jira/browse/HBASE-26084 > Project: HBase > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha-1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-2 > > > The current ReplicationQueueInfo only has queueId, which is not enough to > distinguish queues in ReplicationServer, so we need to add the RS holding > the queue for ReplicationQueueInfo. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26084) Add owner of replication queue for ReplicationQueueInfo
Sun Xin created HBASE-26084: --- Summary: Add owner of replication queue for ReplicationQueueInfo Key: HBASE-26084 URL: https://issues.apache.org/jira/browse/HBASE-26084 Project: HBase Issue Type: Sub-task Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 The current ReplicationQueueInfo only has queueId, which is not enough to distinguish queues in ReplicationServer, so we need to add the RS holding the queue for ReplicationQueueInfo. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25110) Add heartbeat for ReplicationServer and dispatch replication sources to ReplicationServer
[ https://issues.apache.org/jira/browse/HBASE-25110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25110. - Release Note: Divide this issue into two to achieve, HBASE-26077 and HBASE-26078 Resolution: Incomplete > Add heartbeat for ReplicationServer and dispatch replication sources to > ReplicationServer > - > > Key: HBASE-25110 > URL: https://issues.apache.org/jira/browse/HBASE-25110 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Sun Xin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26078) Dispatch replication sources to ReplicationServer
Sun Xin created HBASE-26078: --- Summary: Dispatch replication sources to ReplicationServer Key: HBASE-26078 URL: https://issues.apache.org/jira/browse/HBASE-26078 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26077) Add heartbeat for ReplicationServer
Sun Xin created HBASE-26077: --- Summary: Add heartbeat for ReplicationServer Key: HBASE-26077 URL: https://issues.apache.org/jira/browse/HBASE-26077 Project: HBase Issue Type: Sub-task Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto
[ https://issues.apache.org/jira/browse/HBASE-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25807. - Fix Version/s: 3.0.0-alpha-1 Resolution: Done Merged. Thank [~zhangduo] for reviewing. > Move method reportProcedureDone from RegionServerStatus.proto to Master.proto > - > > Key: HBASE-25807 > URL: https://issues.apache.org/jira/browse/HBASE-25807 > Project: HBase > Issue Type: Sub-task >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1 > > > We next need use the procedure mechanism to implement enable/disable/refresh > peer, and ReplicationServer also needs reportProcedureDone to master, so I > hope to move method reportProcedureDone to Master.proto from > RegionServerStatus.proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25820) Find a way to know whether logQueue goes empty when ReplicationSource is running on ReplicationServer
Sun Xin created HBASE-25820: --- Summary: Find a way to know whether logQueue goes empty when ReplicationSource is running on ReplicationServer Key: HBASE-25820 URL: https://issues.apache.org/jira/browse/HBASE-25820 Project: HBase Issue Type: Sub-task Reporter: Sun Xin HBASE-25110 we choose to use ZK to notify ReplicationServer that a new wal was generated, this is asynchronous. And then we got a problem, the shipper thread and the wal reader thread may go terminated as logQueue goes empty before receiving the notification of new wal. So we now need find a way to know whether logQueue is really empty after the last wal in logQueue is consumed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24737) Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten problem
[ https://issues.apache.org/jira/browse/HBASE-24737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-24737. - Resolution: Done > Find a way to resolve WALFileLengthProvider#getLogFileSizeIfBeingWritten > problem > > > Key: HBASE-24737 > URL: https://issues.apache.org/jira/browse/HBASE-24737 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Sun Xin >Priority: Major > > Now we use WALFileLengthProvider#getLogFileSizeIfBeingWritten to get the > synced wal length and prevent replicating unacked log entries. But after > offload ReplicationSource to new ReplicationServer, we need a new way to > resolve this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25807) Move method reportProcedureDone from RegionServerStatus.proto to Master.proto
Sun Xin created HBASE-25807: --- Summary: Move method reportProcedureDone from RegionServerStatus.proto to Master.proto Key: HBASE-25807 URL: https://issues.apache.org/jira/browse/HBASE-25807 Project: HBase Issue Type: Sub-task Reporter: Sun Xin We next need use the procedure mechanism to implement enable/disable/refresh peer, and ReplicationServer also needs reportProcedureDone to master, so I hope to move method reportProcedureDone to Master.proto from RegionServerStatus.proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying
[ https://issues.apache.org/jira/browse/HBASE-25562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25562. - Fix Version/s: 2.4.3 2.3.5 3.0.0-alpha-1 Resolution: Fixed > ReplicationSourceWALReader log and handle exception immediately without > retrying > > > Key: HBASE-25562 > URL: https://issues.apache.org/jira/browse/HBASE-25562 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.5, 2.4.3 > > > In [this piece of code about retrying in > ReplicationSourceWALReader#run|https://github.com/apache/hbase/blob/0353909bc268e3ff3def098963d021e973f1f153/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L151], > sleep time increases with the number of retries, if an exception happens > that cannot be recovered by itself, error logs will appear after 12 hours > (300 retries by default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25683) Simplify UTs using DummyServer
Sun Xin created HBASE-25683: --- Summary: Simplify UTs using DummyServer Key: HBASE-25683 URL: https://issues.apache.org/jira/browse/HBASE-25683 Project: HBase Issue Type: Test Components: test Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25638) The master local region is constantly major compact
[ https://issues.apache.org/jira/browse/HBASE-25638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25638. - Resolution: Not A Problem > The master local region is constantly major compact > --- > > Key: HBASE-25638 > URL: https://issues.apache.org/jira/browse/HBASE-25638 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.3.4, 2.4.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > > In > [MasterRegionFlusherAndCompactor.compact|https://github.com/apache/hbase/blob/830d2895b27fa0cf39a28d3af9673a4126ea8258/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFlusherAndCompactor.java#L164], > we call region.compact(true) constantly like recursion. This caused a lot of > logs to be flushed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25638) The master local region is constantly major compact
Sun Xin created HBASE-25638: --- Summary: The master local region is constantly major compact Key: HBASE-25638 URL: https://issues.apache.org/jira/browse/HBASE-25638 Project: HBase Issue Type: Bug Affects Versions: 2.4.1, 2.3.4, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin In [MasterRegionFlusherAndCompactor.compact|https://github.com/apache/hbase/blob/830d2895b27fa0cf39a28d3af9673a4126ea8258/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFlusherAndCompactor.java#L164], we call region.compact(true) constantly like recursion. This caused a lot of logs to be flushed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25598) TestFromClientSide5.testScanMetrics is flaky
[ https://issues.apache.org/jira/browse/HBASE-25598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25598. - Fix Version/s: 2.4.2 2.3.5 2.2.7 3.0.0-alpha-1 Resolution: Fixed Thanks [~zhangduo] for reviewing. Merged to master and all active branch-2.x. > TestFromClientSide5.testScanMetrics is flaky > > > Key: HBASE-25598 > URL: https://issues.apache.org/jira/browse/HBASE-25598 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.3.4, 2.4.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2 > > > In some PRs, I got the following errors in UT results. > {code:java} > [ERROR] Errors: > [ERROR] org.apache.hadoop.hbase.client.TestFromClientSide5.testScanMetrics[0] > [ERROR] Run 1: TestFromClientSide5.testScanMetrics:1018 Did not count the > result bytes expected:<60> but was:<120> > [ERROR] Run 2: TestFromClientSide5.testScanMetrics:1036 Did not count the > result bytes expected:<60> but was:<180> > [ERROR] Run 3: TestFromClientSide5.testScanMetrics:951 » > MasterRegistryFetch Exception making... > [INFO] > [ERROR] > org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor5.testScanMetrics[1] > [ERROR] Run 1: > TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:1036 > Did not count the result bytes expected:<60> but was:<120> > [ERROR] Run 2: > TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » > IO > [ERROR] Run 3: > TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » > IO > [INFO] > {code} > I read the code further and found that this UT is flaky. > {code:java} > // check byte counters > scan2 = new Scan(); > scan2.setScanMetricsEnabled(true); > scan2.setCaching(1); > try (ResultScanner scanner = ht.getScanner(scan2)) { > int numBytes = 0; > for (Result result : scanner.next(1)) { > for (Cell cell : result.listCells()) { > numBytes += PrivateCellUtil.estimatedSerializedSizeOf(cell); > } > } > scanner.close(); > ScanMetrics scanMetrics = scanner.getScanMetrics(); > assertEquals("Did not count the result bytes", numBytes, > scanMetrics.countOfBytesInResults.get()); > } > {code} > In the code above, it is to check scanMetrics.countOfBytesInResults, but just > get only ONE row by scanner.next(1) . A total of 3 rows are inserted into the > table, and scanner prefetch from server in advance until maxCacheSize is > exceeded, see > [here|https://github.com/apache/hbase/blob/5fa15cfde3d77e77ffb1f09d60dce4db264f3831/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java#L94]. > So if scanner prefetch more than one row before closing scanner, the UT > fails. we can reproduce this problem steadily by sleeping before > scanner.close(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25598) TestFromClientSide5.testScanMetrics is flaky
Sun Xin created HBASE-25598: --- Summary: TestFromClientSide5.testScanMetrics is flaky Key: HBASE-25598 URL: https://issues.apache.org/jira/browse/HBASE-25598 Project: HBase Issue Type: Bug Affects Versions: 2.4.1, 2.3.4, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin In some PRs, I got the following errors in UT results. {code:java} [ERROR] Errors: [ERROR] org.apache.hadoop.hbase.client.TestFromClientSide5.testScanMetrics[0] [ERROR] Run 1: TestFromClientSide5.testScanMetrics:1018 Did not count the result bytes expected:<60> but was:<120> [ERROR] Run 2: TestFromClientSide5.testScanMetrics:1036 Did not count the result bytes expected:<60> but was:<180> [ERROR] Run 3: TestFromClientSide5.testScanMetrics:951 » MasterRegistryFetch Exception making... [INFO] [ERROR] org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor5.testScanMetrics[1] [ERROR] Run 1: TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:1036 Did not count the result bytes expected:<60> but was:<120> [ERROR] Run 2: TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » IO [ERROR] Run 3: TestFromClientSideWithCoprocessor5>TestFromClientSide5.testScanMetrics:951 » IO [INFO] {code} I read the code further and found that this UT is flaky. {code:java} // check byte counters scan2 = new Scan(); scan2.setScanMetricsEnabled(true); scan2.setCaching(1); try (ResultScanner scanner = ht.getScanner(scan2)) { int numBytes = 0; for (Result result : scanner.next(1)) { for (Cell cell : result.listCells()) { numBytes += PrivateCellUtil.estimatedSerializedSizeOf(cell); } } scanner.close(); ScanMetrics scanMetrics = scanner.getScanMetrics(); assertEquals("Did not count the result bytes", numBytes, scanMetrics.countOfBytesInResults.get()); } {code} In the code above, it is to check scanMetrics.countOfBytesInResults, but just get only ONE row by scanner.next(1) . A total of 3 rows are inserted into the table, and scanner prefetch from server in advance until maxCacheSize is exceeded, see [here|https://github.com/apache/hbase/blob/5fa15cfde3d77e77ffb1f09d60dce4db264f3831/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncTableResultScanner.java#L94]. So if scanner prefetch more than one row before closing scanner, the UT fails. we can reproduce this problem steadily by sleeping before scanner.close(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25590) Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs
Sun Xin created HBASE-25590: --- Summary: Bulkload replication HFileRefs cannot be cleared in some cases where set exclude-namespace/exclude-table-cfs Key: HBASE-25590 URL: https://issues.apache.org/jira/browse/HBASE-25590 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin In [ReplicationSource#addHFileRefs|https://github.com/apache/hbase/blob/ed90a14995acd87111d2b9849f07d84418ca43d4/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L264], we may add unwanted hfiles to the _HFileRefs_ if a peer is set _replicate_all_ true and set _exclude-namespace/exclude-table-cfs_. These unwanted _HFileRefs_ will not be replicated to remote cluster and not be cleared. Two problems are caused by this bug: # The metric sizeOfHFileRefsQueue cannot be zeroed. # Referenced HFiles cannot be deleted by _ReplicationHFileCleaner._ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25559) Terminate threads of oldsources while RS is closing
[ https://issues.apache.org/jira/browse/HBASE-25559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25559. - Fix Version/s: 2.4.2 2.3.5 2.2.7 3.0.0-alpha-1 Resolution: Fixed Merged to master and all active branch-2.x. > Terminate threads of oldsources while RS is closing > --- > > Key: HBASE-25559 > URL: https://issues.apache.org/jira/browse/HBASE-25559 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.6, 2.3.4, 2.4.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.5, 2.4.2 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25562) ReplicationSourceWALReader log and handle exception immediately without retrying
Sun Xin created HBASE-25562: --- Summary: ReplicationSourceWALReader log and handle exception immediately without retrying Key: HBASE-25562 URL: https://issues.apache.org/jira/browse/HBASE-25562 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin In this piece of code about retrying in ReplicationSourceWALReader#run, sleep time increases with the number of retries, if an exception happens that cannot be recovered by itself, error logs will appear after 12 hours (300 retries by default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25560) Remove unused parameter named peerId in the constructor method of CatalogReplicationSourcePeer
Sun Xin created HBASE-25560: --- Summary: Remove unused parameter named peerId in the constructor method of CatalogReplicationSourcePeer Key: HBASE-25560 URL: https://issues.apache.org/jira/browse/HBASE-25560 Project: HBase Issue Type: Bug Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25559) Terminate threads of oldsources while RS is closing
Sun Xin created HBASE-25559: --- Summary: Terminate threads of oldsources while RS is closing Key: HBASE-25559 URL: https://issues.apache.org/jira/browse/HBASE-25559 Project: HBase Issue Type: Bug Affects Versions: 2.4.1, 2.3.4, 2.2.6, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String
[ https://issues.apache.org/jira/browse/HBASE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-25553. - Resolution: Fixed > It is better for ReplicationTracker.getListOfRegionServers to return > ServerName instead of String > - > > Key: HBASE-25553 > URL: https://issues.apache.org/jira/browse/HBASE-25553 > Project: HBase > Issue Type: Umbrella >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.3.5, 2.4.2 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25553) It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String
Sun Xin created HBASE-25553: --- Summary: It is better for ReplicationTracker.getListOfRegionServers to return ServerName instead of String Key: HBASE-25553 URL: https://issues.apache.org/jira/browse/HBASE-25553 Project: HBase Issue Type: Bug Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25309) Support start/stop replication server by scripts
Sun Xin created HBASE-25309: --- Summary: Support start/stop replication server by scripts Key: HBASE-25309 URL: https://issues.apache.org/jira/browse/HBASE-25309 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25305) Add master UI to show ReplicationServer
Sun Xin created HBASE-25305: --- Summary: Add master UI to show ReplicationServer Key: HBASE-25305 URL: https://issues.apache.org/jira/browse/HBASE-25305 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25300) 'Unknown table hbase:quota' happens when desc table in shell if quota disabled
Sun Xin created HBASE-25300: --- Summary: 'Unknown table hbase:quota' happens when desc table in shell if quota disabled Key: HBASE-25300 URL: https://issues.apache.org/jira/browse/HBASE-25300 Project: HBase Issue Type: Bug Components: shell Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25289) [testing] Clean up resources after tests in rsgroup_shell_test.rb
Sun Xin created HBASE-25289: --- Summary: [testing] Clean up resources after tests in rsgroup_shell_test.rb Key: HBASE-25289 URL: https://issues.apache.org/jira/browse/HBASE-25289 Project: HBase Issue Type: Improvement Components: rsgroup, test Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In rsgroup_shell_test.rb, some tests don't remove rsgroups and drop tables, messing up adding new tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25171) Remove ZNodePaths.namespaceZNode
Sun Xin created HBASE-25171: --- Summary: Remove ZNodePaths.namespaceZNode Key: HBASE-25171 URL: https://issues.apache.org/jira/browse/HBASE-25171 Project: HBase Issue Type: Improvement Components: Zookeeper Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In HBASE-21154, had removed the dependency on ZNodePaths.namespaceZNode, so remove this field. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25117) ReplicationSourceShipper thread can not be finished
Sun Xin created HBASE-25117: --- Summary: ReplicationSourceShipper thread can not be finished Key: HBASE-25117 URL: https://issues.apache.org/jira/browse/HBASE-25117 Project: HBase Issue Type: Bug Reporter: Sun Xin Assignee: Sun Xin See [Flaky Tests|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/master/161/console], some UTs about replication failed cause timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25113) [testing] HBaseCluster support ReplicationServer for UTs
Sun Xin created HBASE-25113: --- Summary: [testing] HBaseCluster support ReplicationServer for UTs Key: HBASE-25113 URL: https://issues.apache.org/jira/browse/HBASE-25113 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25100) conn is assigned twice in HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint
Sun Xin created HBASE-25100: --- Summary: conn is assigned twice in HBaseReplicationEndpoint and HBaseInterClusterReplicationEndpoint Key: HBASE-25100 URL: https://issues.apache.org/jira/browse/HBASE-25100 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In [HBaseReplicationEndpoint.init()|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java#L109] and [HBaseInterClusterReplicationEndpoint.init|https://github.com/apache/hbase/blob/c312760819ed185cab3a0717a1ea0ff6e8c47a23/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L145] , the latter is a sub-class of the former, conn is assigned twice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25098) ReplicationStatisticsChore runs in wrong time unit
Sun Xin created HBASE-25098: --- Summary: ReplicationStatisticsChore runs in wrong time unit Key: HBASE-25098 URL: https://issues.apache.org/jira/browse/HBASE-25098 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25014) ScheduledChore is never triggered when initalDelay > 1.5*period
Sun Xin created HBASE-25014: --- Summary: ScheduledChore is never triggered when initalDelay > 1.5*period Key: HBASE-25014 URL: https://issues.apache.org/jira/browse/HBASE-25014 Project: HBase Issue Type: Bug Affects Versions: 2.2.5, 2.2.4, 2.2.3, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In our recent tests, ScheduledChore is never triggered when initalDelay > 1.5*period. The cause of the bug is the following: The trigger time for a ScheduleChore must be within an acceptable time window that is 1.5 * period. see [here|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L234] timeOfLastRun and timeOfThisRun are two variables that record two adjacent trigger time. [The first initialization of timeOfThisRun|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L273] is when the ScheduleChore is created, it's not a real trigger time. If we set initialDelay > 1.5 period , after initialDelay, the first time when chore is triggered has exceeded the allowed window. Then [cancel the chore and schedule it again|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java#L176]. So it's stuck in loop when initialDelay > 1.5 period : 1. init timeOfThisRun at a wrong time. 2. wait initalDelay 3. chore trigger, but exceeded the allowed window. 4. cancel chore and schedule it again 5. go step 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException
Sun Xin created HBASE-25012: --- Summary: HBASE-24359 causes replication missed log of some RemoteException Key: HBASE-25012 URL: https://issues.apache.org/jira/browse/HBASE-25012 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.3.1, 2.3.0, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 [HBASE-24359|https://issues.apache.org/jira/browse/HBASE-24359] broken the logic of handling exception. In branch2, it even causes some RemoteException log missed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24999) Master manages ReplicationServers
Sun Xin created HBASE-24999: --- Summary: Master manages ReplicationServers Key: HBASE-24999 URL: https://issues.apache.org/jira/browse/HBASE-24999 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin In [HBASE-24683|https://issues.apache.org/jira/browse/HBASE-24683] add an isolated ReplicationServer. What this issue is to do: # ReplicationServer reports to Master periodically. # Add a basic ReplicationServerManager in Master to manage ReplicationServer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24982) Disassemble the method replicateWALEntry from AdminService to a new interface ReplicationSinkService
Sun Xin created HBASE-24982: --- Summary: Disassemble the method replicateWALEntry from AdminService to a new interface ReplicationSinkService Key: HBASE-24982 URL: https://issues.apache.org/jira/browse/HBASE-24982 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Sun Xin Assignee: Sun Xin -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24683) Add a basic ReplicationServer which only implement ReplicationSink Service
[ https://issues.apache.org/jira/browse/HBASE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-24683. - Resolution: Resolved > Add a basic ReplicationServer which only implement ReplicationSink Service > -- > > Key: HBASE-24683 > URL: https://issues.apache.org/jira/browse/HBASE-24683 > Project: HBase > Issue Type: Sub-task >Reporter: Guanghao Zhang >Assignee: Sun Xin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24914) Reomve duplicate code appearing continuously in method ReplicationPeerManager.updatePeerConfig
Sun Xin created HBASE-24914: --- Summary: Reomve duplicate code appearing continuously in method ReplicationPeerManager.updatePeerConfig Key: HBASE-24914 URL: https://issues.apache.org/jira/browse/HBASE-24914 Project: HBase Issue Type: Improvement Components: Replication Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In [ReplicationPeerManager.updatePeerConfig|https://github.com/apache/hbase/blob/1164531d5ab519ab58af82ba3849f8fcded3453f/hbase-server/src/main/java/org/apache/hadoop/hbase/master/replication/ReplicationPeerManager.java#L272], I found duplicate code appearing twice continuously, so remove once. {code:java} newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration()); newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration()); newPeerConfigBuilder.putAllConfiguration(oldPeerConfig.getConfiguration()); newPeerConfigBuilder.putAllConfiguration(peerConfig.getConfiguration()); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24913) Refactor TestJMXConnectorServer
Sun Xin created HBASE-24913: --- Summary: Refactor TestJMXConnectorServer Key: HBASE-24913 URL: https://issues.apache.org/jira/browse/HBASE-24913 Project: HBase Issue Type: Improvement Components: test Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 Two optimization points for TestJMXConnectorServer in this issue: # Just run cluster once, not once per test case. # Use random free port to run ConnectorServer, avoid specifying a fixed port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24797) Move log code out of loop
Sun Xin created HBASE-24797: --- Summary: Move log code out of loop Key: HBASE-24797 URL: https://issues.apache.org/jira/browse/HBASE-24797 Project: HBase Issue Type: Bug Components: Normalizer Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In HMaster#normalizeRegions, maybe we shoule move the log code about submittedPlanProcIds out of loop. {code:java} public boolean normalizeRegions() throws IOException { ... final List submittedPlanProcIds = new ArrayList<>(); for (TableName table : allEnabledTables) { ... for (NormalizationPlan plan : plans) { long procId = plan.submit(this); submittedPlanProcIds.add(procId); ... } int totalPlansSubmitted = submittedPlanProcIds.size(); if (totalPlansSubmitted > 0 && LOG.isDebugEnabled()) { LOG.debug("Normalizer plans submitted. Total plans count: {} , procID list: {}", totalPlansSubmitted, submittedPlanProcIds); } } ... } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24769) Auto scale RSGroup
Sun Xin created HBASE-24769: --- Summary: Auto scale RSGroup Key: HBASE-24769 URL: https://issues.apache.org/jira/browse/HBASE-24769 Project: HBase Issue Type: New Feature Components: rsgroup Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In current use, if RSs go offline or online, we must manually move RSs in or out RSGroups. Now we can configure how many servers rsgroups need base on HBASE-24431 , and then add an AutoScaleChore to periodically check and move servers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24760) Allow system tables fallback to any rs groups
Sun Xin created HBASE-24760: --- Summary: Allow system tables fallback to any rs groups Key: HBASE-24760 URL: https://issues.apache.org/jira/browse/HBASE-24760 Project: HBase Issue Type: New Feature Components: rsgroup Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In HBASE-22738 we allow tables fallback to specific rs groups, If there is no online servers in the table's rsgroup. But for system tables, It is necessary to allow system tables fallback to any rsgroup in order to keey available at all times. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24759) Persisting configuration of default rsgroup
Sun Xin created HBASE-24759: --- Summary: Persisting configuration of default rsgroup Key: HBASE-24759 URL: https://issues.apache.org/jira/browse/HBASE-24759 Project: HBase Issue Type: New Feature Components: rsgroup Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In the current scenario, we didn't store the default rsgroup information. But after HBASE-24431 , we have added a config map, which need to be persisted to avoid lossing config of default rsgroup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24654) Allow unset table's rsgroup
Sun Xin created HBASE-24654: --- Summary: Allow unset table's rsgroup Key: HBASE-24654 URL: https://issues.apache.org/jira/browse/HBASE-24654 Project: HBase Issue Type: New Feature Components: rsgroup Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In TableDescriptorBuilder, we have only one method to set rsgroup, but have no one to unset it. this unset method is necessary In some cases. If the table had rsgroup config before, but now I want to use the namespace config. It doesn't work that I set table rsgroup config to default rsgroup, must remove rsgroup config. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24591) get_table_rsgroup ignored the existence of rsgroup config for namespace
Sun Xin created HBASE-24591: --- Summary: get_table_rsgroup ignored the existence of rsgroup config for namespace Key: HBASE-24591 URL: https://issues.apache.org/jira/browse/HBASE-24591 Project: HBase Issue Type: Bug Components: rsgroup Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 {code:java} public GetRSGroupInfoOfTableResponse getRSGroupInfoOfTable(RpcController controller, GetRSGroupInfoOfTableRequest request) throws ServiceException { TableName tableName = ProtobufUtil.toTableName(request.getTableName()); ... try { ... GetRSGroupInfoOfTableResponse resp; TableDescriptor td = master.getTableDescriptors().get(tableName); if (td == null) { resp = GetRSGroupInfoOfTableResponse.getDefaultInstance(); } else { RSGroupInfo rsGroupInfo = null; if (td.getRegionServerGroup().isPresent()) { rsGroupInfo = master.getRSGroupInfoManager().getRSGroup(td.getRegionServerGroup().get()); } if (rsGroupInfo == null) { rsGroupInfo = master.getRSGroupInfoManager().getRSGroup(RSGroupInfo.DEFAULT_GROUP); } resp = GetRSGroupInfoOfTableResponse.newBuilder() .setRSGroupInfo(ProtobufUtil.toProtoGroupInfo(rsGroupInfo)).build(); } ... return resp; } catch (IOException e) { throw new ServiceException(e); } } {code} In method MasterRpcServices#getRSGroupInfoOfTable, ignored namespace hbase.rsgroup.name config. It should be replaced by RSGroupUtil#getRSGroupInfo. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24431) RSGroupInfo add configuration map to store something extra
Sun Xin created HBASE-24431: --- Summary: RSGroupInfo add configuration map to store something extra Key: HBASE-24431 URL: https://issues.apache.org/jira/browse/HBASE-24431 Project: HBase Issue Type: Improvement Components: rsgroup Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 Maybe we should add a _Map configuration_ into RSGroupInfo to store extra infomation. For example, we can store the minimum number of machines the group needs, in order to move machine into this group automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24416) RegionNormalizer spliting region should not be limited by hbase.normalizer.min.region.count
Sun Xin created HBASE-24416: --- Summary: RegionNormalizer spliting region should not be limited by hbase.normalizer.min.region.count Key: HBASE-24416 URL: https://issues.apache.org/jira/browse/HBASE-24416 Project: HBase Issue Type: Improvement Affects Versions: 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 In method computePlanForTable of SimpleRegionNormalizer: we will skip spliting region if the number of regions in the table is less than hbase.normalizer.min.region.count, even if there is a huge region in the table. {code:java} ... if (tableRegions == null || tableRegions.size() < minRegionCount) { ... return null; } ... // get region split plan if (splitEnabled) { List splitPlans = getSplitNormalizationPlan(table); if (splitPlans != null) { plans.addAll(splitPlans); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24399) [Flakey Tests] Some UTs about RSGroup should wait RSGroupInfoManager to be online
Sun Xin created HBASE-24399: --- Summary: [Flakey Tests] Some UTs about RSGroup should wait RSGroupInfoManager to be online Key: HBASE-24399 URL: https://issues.apache.org/jira/browse/HBASE-24399 Project: HBase Issue Type: Improvement Components: rsgroup Affects Versions: 2.3.0 Reporter: Sun Xin Assignee: Sun Xin Fix For: 2.3.0 We will access table hbase:rsgroup when call addRSGroup, so we should ensure RSGroupInfoManagerImpl is online before testing in the UTs about RSGroup. Otherwise, the following exceptions may be saw. {code:java} java.io.IOException: java.io.IOException: Only servers in default group can be updated during offline modejava.io.IOException: java.io.IOException: Only servers in default group can be updated during offline mode at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.flushConfig(RSGroupInfoManagerImpl.java:602) at org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.addRSGroup(RSGroupInfoManagerImpl.java:217) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.addRSGroup(RSGroupAdminServer.java:391) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24359) Optionally ignore edits for deleted CFs for replication.
Sun Xin created HBASE-24359: --- Summary: Optionally ignore edits for deleted CFs for replication. Key: HBASE-24359 URL: https://issues.apache.org/jira/browse/HBASE-24359 Project: HBase Issue Type: Improvement Components: Replication Affects Versions: 2.2.4 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1 Replication will be stuck after we delete CFs from both the source and the sink, if the source still has outstanding edits that now it could not get rid of. Now all replication is backed up behind these unreplicatable edits. We should have an option to ignore edits for deleted CFs at the source. This issue is similar to [HBASE-12091|https://issues.apache.org/jira/browse/HBASE-12091] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24166) Duplicate implementation for acquireLock between CreateTableProcedure and its parent class
[ https://issues.apache.org/jira/browse/HBASE-24166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-24166. - Resolution: Duplicate > Duplicate implementation for acquireLock between CreateTableProcedure and its > parent class > -- > > Key: HBASE-24166 > URL: https://issues.apache.org/jira/browse/HBASE-24166 > Project: HBase > Issue Type: Improvement > Components: proc-v2 >Affects Versions: 3.0.0, 2.2.4 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Minor > Fix For: 3.0.0 > > > The override method _acquireLock_ in _CreateTableProcedure_ and > _InitMetaProcedure_ is the same as the implementation in its parent class > _AbstractStateMachineTableProcedure_. So delete the override method in > subclass. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24166) Duplicate implementation for acquireLock between CreateTableProcedure and its parent class
Sun Xin created HBASE-24166: --- Summary: Duplicate implementation for acquireLock between CreateTableProcedure and its parent class Key: HBASE-24166 URL: https://issues.apache.org/jira/browse/HBASE-24166 Project: HBase Issue Type: Improvement Components: proc-v2 Affects Versions: 2.2.4, 3.0.0 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0 The override method _acquireLock_ in _CreateTableProcedure_ and _InitMetaProcedure_ is the same as the implementation in its parent class _AbstractStateMachineTableProcedure_. So delete the override method in subclass. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23376) NPE happens while replica region is moving
[ https://issues.apache.org/jira/browse/HBASE-23376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin reopened HBASE-23376: - > NPE happens while replica region is moving > -- > > Key: HBASE-23376 > URL: https://issues.apache.org/jira/browse/HBASE-23376 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Minor > Attachments: HBASE-23376.branch-2.001.patch > > > The following code is from AsyncNonMetaRegionLocator#addToCache > {code:java} > private RegionLocations addToCache(TableCache tableCache, RegionLocations > locs) { > LOG.trace("Try adding {} to cache", locs); > byte[] startKey = locs.getDefaultRegionLocation().getRegion().getStartKey(); > ... > }{code} > we will get a NPE if the locs is without the default region. > > The following code is from > AsyncRegionLocatorHelper#updateCachedLocationOnError > {code:java} > ... > if (cause instanceof RegionMovedException) { > RegionMovedException rme = (RegionMovedException) cause; > HRegionLocation newLoc = > new HRegionLocation(loc.getRegion(), rme.getServerName(), > rme.getLocationSeqNum()); > LOG.debug("Try updating {} with the new location {} constructed by {}", > loc, newLoc, > rme.toString()); > addToCache.accept(newLoc); > ...{code} > If the replica region is moving, we will get a RegionMovedException and add > the HRegionLocation of replica region to cache. And finally NPE happens. > > {code:java} > java.lang.NullPointerExceptionjava.lang.NullPointerException at > org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addToCache(AsyncNonMetaRegionLocator.java:240) > at > org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addLocationToCache(AsyncNonMetaRegionLocator.java:596) > at > org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:80) > at > org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610) > at > org.apache.hadoop.hbase.client.AsyncRegionLocator.updateCachedLocationOnError(AsyncRegionLocator.java:153) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23376) NPE happens while replica region is moving
[ https://issues.apache.org/jira/browse/HBASE-23376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Xin resolved HBASE-23376. - Resolution: Fixed > NPE happens while replica region is moving > -- > > Key: HBASE-23376 > URL: https://issues.apache.org/jira/browse/HBASE-23376 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Minor > Attachments: HBASE-23376.branch-2.001.patch > > > The following code is from AsyncNonMetaRegionLocator#addToCache > {code:java} > private RegionLocations addToCache(TableCache tableCache, RegionLocations > locs) { > LOG.trace("Try adding {} to cache", locs); > byte[] startKey = locs.getDefaultRegionLocation().getRegion().getStartKey(); > ... > }{code} > we will get a NPE if the locs is without the default region. > > The following code is from > AsyncRegionLocatorHelper#updateCachedLocationOnError > {code:java} > ... > if (cause instanceof RegionMovedException) { > RegionMovedException rme = (RegionMovedException) cause; > HRegionLocation newLoc = > new HRegionLocation(loc.getRegion(), rme.getServerName(), > rme.getLocationSeqNum()); > LOG.debug("Try updating {} with the new location {} constructed by {}", > loc, newLoc, > rme.toString()); > addToCache.accept(newLoc); > ...{code} > If the replica region is moving, we will get a RegionMovedException and add > the HRegionLocation of replica region to cache. And finally NPE happens. > > {code:java} > java.lang.NullPointerExceptionjava.lang.NullPointerException at > org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addToCache(AsyncNonMetaRegionLocator.java:240) > at > org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addLocationToCache(AsyncNonMetaRegionLocator.java:596) > at > org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:80) > at > org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610) > at > org.apache.hadoop.hbase.client.AsyncRegionLocator.updateCachedLocationOnError(AsyncRegionLocator.java:153) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23376) NPE happens while replica region is moving
Sun Xin created HBASE-23376: --- Summary: NPE happens while replica region is moving Key: HBASE-23376 URL: https://issues.apache.org/jira/browse/HBASE-23376 Project: HBase Issue Type: Bug Components: read replicas Reporter: Sun Xin Assignee: Sun Xin The following code is from AsyncNonMetaRegionLocator#addToCache {code:java} private RegionLocations addToCache(TableCache tableCache, RegionLocations locs) { LOG.trace("Try adding {} to cache", locs); byte[] startKey = locs.getDefaultRegionLocation().getRegion().getStartKey(); ... }{code} we will get a NPE if the locs is without the default region. The following code is from AsyncRegionLocatorHelper#updateCachedLocationOnError {code:java} ... if (cause instanceof RegionMovedException) { RegionMovedException rme = (RegionMovedException) cause; HRegionLocation newLoc = new HRegionLocation(loc.getRegion(), rme.getServerName(), rme.getLocationSeqNum()); LOG.debug("Try updating {} with the new location {} constructed by {}", loc, newLoc, rme.toString()); addToCache.accept(newLoc); ...{code} If the replica region is moving, we will get a RegionMovedException and add the HRegionLocation of replica region to cache. And finally NPE happens. {code:java} java.lang.NullPointerExceptionjava.lang.NullPointerException at org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addToCache(AsyncNonMetaRegionLocator.java:240) at org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.addLocationToCache(AsyncNonMetaRegionLocator.java:596) at org.apache.hadoop.hbase.client.AsyncRegionLocatorHelper.updateCachedLocationOnError(AsyncRegionLocatorHelper.java:80) at org.apache.hadoop.hbase.client.AsyncNonMetaRegionLocator.updateCachedLocationOnError(AsyncNonMetaRegionLocator.java:610) at org.apache.hadoop.hbase.client.AsyncRegionLocator.updateCachedLocationOnError(AsyncRegionLocator.java:153) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23345) Table need to replication unless all of cfs are excluded
Sun Xin created HBASE-23345: --- Summary: Table need to replication unless all of cfs are excluded Key: HBASE-23345 URL: https://issues.apache.org/jira/browse/HBASE-23345 Project: HBase Issue Type: Bug Components: Replication Reporter: Sun Xin ReplicationPeerConfig.needToReplicate return false, when replicateAllUserTables is true and excludeTableCFsMap contains part of cfs. Should judge by whether all of cfs are excluded. {code:java} public boolean needToReplicate(TableName table) { if (replicateAllUserTables) { if (excludeNamespaces != null && excludeNamespaces.contains(table.getNamespaceAsString())) { return false; } if (excludeTableCFsMap != null && excludeTableCFsMap.containsKey(table)) { return false; } return true; } else { if (namespaces != null && namespaces.contains(table.getNamespaceAsString())) { return true; } if (tableCFsMap != null && tableCFsMap.containsKey(table)) { return true; } return false; } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)