[jira] [Created] (HBASE-23633) Find a way to handle the corrupt recovered hfiles
Guanghao Zhang created HBASE-23633: -- Summary: Find a way to handle the corrupt recovered hfiles Key: HBASE-23633 URL: https://issues.apache.org/jira/browse/HBASE-23633 Project: HBase Issue Type: Umbrella Reporter: Guanghao Zhang Copy the comment from PR review. If the file is a corrupt HFile, an exception will be thrown here, which will cause the region to fail to open. Maybe we can add a new parameter to control whether to skip the exception, similar to recover edits which has a parameter "hbase.hregion.edits.replay.skip.errors"; Regions that can't be opened because of detached References or corrupt hfiles are a fact-of-life. We need work on this issue. This will be a new variant on the problem -- i.e. bad recovered hfiles. On adding a config to ignore bad files and just open, thats a bit dangerous as per @infraio as it could mean silent data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23286) Improve MTTR: Split WAL to HFile
[ https://issues.apache.org/jira/browse/HBASE-23286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23286. Resolution: Fixed Pushed to branch-2 and master. Thanks all for reviewing. And opened two follow-up issues. > Improve MTTR: Split WAL to HFile > > > Key: HBASE-23286 > URL: https://issues.apache.org/jira/browse/HBASE-23286 > Project: HBase > Issue Type: Improvement > Components: MTTR >Affects Versions: 3.0.0, 2.3.0 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > After HBASE-20724, the compaction event marker is not used anymore when > failover. So our new proposal is split WAL to HFile to imporve MTTR. It has 3 > steps: > # Read WAL and write HFile to region’s column family’s recovered.hfiles > directory. > # Open region. > # Bulkload the recovered.hfiles for every column family. > The design doc was attathed by a google doc. Any suggestions are welcomed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23175) Yarn unable to acquire delegation token for HBase Spark jobs
[ https://issues.apache.org/jira/browse/HBASE-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-23175: Forgot to pushed to branch-2.2. Reopened it. > Yarn unable to acquire delegation token for HBase Spark jobs > > > Key: HBASE-23175 > URL: https://issues.apache.org/jira/browse/HBASE-23175 > Project: HBase > Issue Type: Bug > Components: security, spark >Affects Versions: 2.0.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.8, 2.2.3 > > Attachments: HBASE-23175.master.001.patch > > > Spark rely on the TokenUtil.obtainToken(conf) API which is removed in > HBase-2.0, though it has been fixed in SPARK-26432 to use the new API but > planned for Spark-3.0, hence we need the fix in HBase until they release it > and we upgrade it > {code} > 18/03/20 20:39:07 ERROR ApplicationMaster: User class threw exception: > org.apache.hadoop.hbase.HBaseIOException: > com.google.protobuf.ServiceException: Error calling method > hbase.pb.AuthenticationService.GetAuthenticationToken > org.apache.hadoop.hbase.HBaseIOException: > com.google.protobuf.ServiceException: Error calling method > hbase.pb.AuthenticationService.GetAuthenticationToken > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:360) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:346) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86) > at > org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:121) > at > org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:118) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:118) > at > org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(TokenUtil.java:272) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initCredentials(TableMapReduceUtil.java:533) > at > org.apache.hadoop.hbase.spark.HBaseContext.(HBaseContext.scala:73) > at > org.apache.hadoop.hbase.spark.JavaHBaseContext.(JavaHBaseContext.scala:46) > at > org.apache.hadoop.hbase.spark.example.hbasecontext.JavaHBaseBulkDeleteExample.main(JavaHBaseBulkDeleteExample.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706) > Caused by: com.google.protobuf.ServiceException: Error calling method > hbase.pb.AuthenticationService.GetAuthenticationToken > at > org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:71) > at > org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:81) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23175) Yarn unable to acquire delegation token for HBase Spark jobs
[ https://issues.apache.org/jira/browse/HBASE-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23175. Resolution: Fixed Pushed to branch-2.2. > Yarn unable to acquire delegation token for HBase Spark jobs > > > Key: HBASE-23175 > URL: https://issues.apache.org/jira/browse/HBASE-23175 > Project: HBase > Issue Type: Bug > Components: security, spark >Affects Versions: 2.0.0 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.3, 2.1.8 > > Attachments: HBASE-23175.master.001.patch > > > Spark rely on the TokenUtil.obtainToken(conf) API which is removed in > HBase-2.0, though it has been fixed in SPARK-26432 to use the new API but > planned for Spark-3.0, hence we need the fix in HBase until they release it > and we upgrade it > {code} > 18/03/20 20:39:07 ERROR ApplicationMaster: User class threw exception: > org.apache.hadoop.hbase.HBaseIOException: > com.google.protobuf.ServiceException: Error calling method > hbase.pb.AuthenticationService.GetAuthenticationToken > org.apache.hadoop.hbase.HBaseIOException: > com.google.protobuf.ServiceException: Error calling method > hbase.pb.AuthenticationService.GetAuthenticationToken > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:360) > at > org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:346) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:86) > at > org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:121) > at > org.apache.hadoop.hbase.security.token.TokenUtil$1.run(TokenUtil.java:118) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:313) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:118) > at > org.apache.hadoop.hbase.security.token.TokenUtil.addTokenForJob(TokenUtil.java:272) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initCredentials(TableMapReduceUtil.java:533) > at > org.apache.hadoop.hbase.spark.HBaseContext.(HBaseContext.scala:73) > at > org.apache.hadoop.hbase.spark.JavaHBaseContext.(JavaHBaseContext.scala:46) > at > org.apache.hadoop.hbase.spark.example.hbasecontext.JavaHBaseBulkDeleteExample.main(JavaHBaseBulkDeleteExample.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:706) > Caused by: com.google.protobuf.ServiceException: Error calling method > hbase.pb.AuthenticationService.GetAuthenticationToken > at > org.apache.hadoop.hbase.client.SyncCoprocessorRpcChannel.callBlockingMethod(SyncCoprocessorRpcChannel.java:71) > at > org.apache.hadoop.hbase.protobuf.generated.AuthenticationProtos$AuthenticationService$BlockingStub.getAuthenticationToken(AuthenticationProtos.java:4512) > at > org.apache.hadoop.hbase.security.token.TokenUtil.obtainToken(TokenUtil.java:81) > ... 17 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23637) Generate CHANGES.md and RELEASENOTES.md for 2.2.3
Guanghao Zhang created HBASE-23637: -- Summary: Generate CHANGES.md and RELEASENOTES.md for 2.2.3 Key: HBASE-23637 URL: https://issues.apache.org/jira/browse/HBASE-23637 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23553) Snapshot referenced data files are deleted in some case
[ https://issues.apache.org/jira/browse/HBASE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23553. Fix Version/s: 2.2.3 2.3.0 3.0.0 Resolution: Fixed > Snapshot referenced data files are deleted in some case > --- > > Key: HBASE-23553 > URL: https://issues.apache.org/jira/browse/HBASE-23553 > Project: HBase > Issue Type: Bug >Reporter: Yi Mei >Assignee: Yi Mei >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.3 > > > We scan snapshot in our cluster and got following exception: > {code:java} > java.io.IOException: java.io.IOException: java.io.FileNotFoundException: > Unable to open link: org.apache.hadoop.hbase.io.HFileLink > locations=[hdfs://tjwqsrv-galaxy98/hbase/tjwqsrv-galaxy98/data/default/galaxy_online_fds_object_table/06dd90d8540b56343859b63a6134450c/A/4a6cf05f419a9f61059cb05a962f, > > hdfs://tjwqsrv-galaxy98/hbase/tjwqsrv-galaxy98/.tmp/data/default/galaxy_online_fds_object_table/06dd90d8540b56343859b63a6134450c/A/4a6cf05f419a9f61059cb05a962f, > > hdfs://tjwqsrv-galaxy98/hbase/tjwqsrv-galaxy98/archive/data/default/galaxy_online_fds_object_table/06dd90d8540b56343859b63a6134450c/A/4a6cf05f419a9f61059cb05a962f] > > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:867) > > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:778) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:749) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5306) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5271) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5243) > at > org.apache.hadoop.hbase.client.ClientSideRegionScanner.(ClientSideRegionScanner.java:72) > > at > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl$RecordReader.initialize(TableSnapshotInputFormatImpl.java:239) > > at > org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat$TableSnapshotRegionRecordReader.initialize(TableSnapshotInputFormat.java:150) > > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:552) > at {code} > I checked to namenode logs and found that this file is deleted by hbase > cleaner although a snapshot still referenced to this file. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23638) Set version to 2.2.3 in branch-2.2 for first RC of 2.2.3
Guanghao Zhang created HBASE-23638: -- Summary: Set version to 2.2.3 in branch-2.2 for first RC of 2.2.3 Key: HBASE-23638 URL: https://issues.apache.org/jira/browse/HBASE-23638 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23655) Fix flaky TestRSGroupsKillRS: should wait the SCP to finish
Guanghao Zhang created HBASE-23655: -- Summary: Fix flaky TestRSGroupsKillRS: should wait the SCP to finish Key: HBASE-23655 URL: https://issues.apache.org/jira/browse/HBASE-23655 Project: HBase Issue Type: Bug Affects Versions: 2.2.2 Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23659) BaseLoadBalancer#wouldLowerAvailability should consider region replicas
Guanghao Zhang created HBASE-23659: -- Summary: BaseLoadBalancer#wouldLowerAvailability should consider region replicas Key: HBASE-23659 URL: https://issues.apache.org/jira/browse/HBASE-23659 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Found this issue when try to fix the flaky unit test TestRegionReplicaSplit. It may fail as java.lang.AssertionError: Splitted regions should not be assigned to same region server. See [https://builds.apache.org/job/HBase-Flaky-Tests/job/master/5227/testReport/junit/org.apache.hadoop.hbase.master.assignment/TestRegionReplicaSplit/testRegionReplicaSplitRegionAssignment/]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23658) Fix flaky TestSnapshotFromMaster
Guanghao Zhang created HBASE-23658: -- Summary: Fix flaky TestSnapshotFromMaster Key: HBASE-23658 URL: https://issues.apache.org/jira/browse/HBASE-23658 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang testAsyncSnapshotWillNotBlockSnapshotHFileCleaner is flaky. The assert may fail. {code:java} assertTrue(master.getSnapshotManager().isTakingAnySnapshot()); future.get(); // in branch-2.2, here is Thread.sleep assertFalse(master.getSnapshotManager().isTakingAnySnapshot()); {code} See [https://builds.apache.org/job/HBase-Flaky-Tests/job/master/5227/testReport/junit/org.apache.hadoop.hbase.master.cleaner/TestSnapshotFromMaster/testAsyncSnapshotWillNotBlockSnapshotHFileCleaner/] [https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.2/lastSuccessfulBuild/artifact/dashboard.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23964) Set version to 2.2.4 in branch-2.2 for first RC of 2.2.3
Guanghao Zhang created HBASE-23964: -- Summary: Set version to 2.2.4 in branch-2.2 for first RC of 2.2.3 Key: HBASE-23964 URL: https://issues.apache.org/jira/browse/HBASE-23964 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23953) SimpleBalancer bug when second pass to fill up to min
[ https://issues.apache.org/jira/browse/HBASE-23953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23953. Fix Version/s: (was: 2.2.5) 2.2.4 2.3.0 3.0.0 Resolution: Fixed Pushed to branch-2.2+. Thanks [~niuyulin] for contributing. > SimpleBalancer bug when second pass to fill up to min > - > > Key: HBASE-23953 > URL: https://issues.apache.org/jira/browse/HBASE-23953 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.2.0 >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.4 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23964) Set version to 2.2.4 in branch-2.2 for first RC of 2.2.4
[ https://issues.apache.org/jira/browse/HBASE-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23964. Resolution: Fixed > Set version to 2.2.4 in branch-2.2 for first RC of 2.2.4 > > > Key: HBASE-23964 > URL: https://issues.apache.org/jira/browse/HBASE-23964 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.4 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23965) Generate CHANGES.md and RELEASENOTES.md for 2.2.4
[ https://issues.apache.org/jira/browse/HBASE-23965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23965. Fix Version/s: 2.2.4 Resolution: Fixed Thanks [~meiyi] for reviewing. Pushed to branch-2.2. > Generate CHANGES.md and RELEASENOTES.md for 2.2.4 > - > > Key: HBASE-23965 > URL: https://issues.apache.org/jira/browse/HBASE-23965 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.4 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23965) Generate CHANGES.md and RELEASENOTES.md for 2.2.4
Guanghao Zhang created HBASE-23965: -- Summary: Generate CHANGES.md and RELEASENOTES.md for 2.2.4 Key: HBASE-23965 URL: https://issues.apache.org/jira/browse/HBASE-23965 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23684) NPE HFilesOutputSink
[ https://issues.apache.org/jira/browse/HBASE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23684. Resolution: Duplicate Resolved as duplicate. > NPE HFilesOutputSink > > > Key: HBASE-23684 > URL: https://issues.apache.org/jira/browse/HBASE-23684 > Project: HBase > Issue Type: Bug > Components: MTTR, wal >Affects Versions: 2.3.0 >Reporter: Michael Stack > Assignee: Guanghao Zhang >Priority: Critical > Fix For: 3.0.0, 2.3.0 > > > Enabling the new split to hfiles feature, HBASE-23286, running branch-2 tip, > I see this out on RegionServers: > {code} > 2020-01-13 17:37:08,204 INFO org.apache.hadoop.hbase.wal.OutputSink: 3 split > writer threads finished > 2020-01-13 17:37:08,233 INFO org.apache.hadoop.hbase.wal.WALSplitter: > Processed 1007 edits across 0 regions cost 284 ms; edits skipped=76; > WAL=hdfs://nameservice1/hbase/genie/WALs/hbasedn101.example.org,16020,1578934806382-splitting/hbasedn101.example.org%2C16020%2C1578934806382.1578937008832, > size=128.5 M, length=134708720, corrupted=false, progress failed=true > 2020-01-13 17:37:08,234 WARN > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of > WALs/hbasedn101.example.org,16020,1578934806382-splitting/hbasedn101.example.org%2C16020%2C1578934806382.1578937008832 > failed, returning error > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.writeRemainingEntryBuffers(BoundedRecoveredHFilesOutputSink.java:173) > at > org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.close(BoundedRecoveredHFilesOutputSink.java:140) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:339) > at > org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:181) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:105) > at > org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:84) > at > org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.configContextForNonMetaWriter(BoundedRecoveredHFilesOutputSink.java:225) > at > org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.createRecoveredHFileWriter(BoundedRecoveredHFilesOutputSink.java:213) > at > org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.append(BoundedRecoveredHFilesOutputSink.java:117) > at > org.apache.hadoop.hbase.wal.BoundedRecoveredHFilesOutputSink.lambda$writeRemainingEntryBuffers$3(BoundedRecoveredHFilesOutputSink.java:155) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > {code} > It is a bit odd because log says there were zero regions. Not sure what that > was about. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23895) STUCK Region-In-Transition when failed to insert procedure to procedure store
[ https://issues.apache.org/jira/browse/HBASE-23895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23895. Resolution: Fixed Pushed to master and branch-2. Thanks [~zhangduo] and [~stack] for reviewing. > STUCK Region-In-Transition when failed to insert procedure to procedure store > - > > Key: HBASE-23895 > URL: https://issues.apache.org/jira/browse/HBASE-23895 > Project: HBase > Issue Type: Bug > Components: proc-v2, RegionProcedureStore > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: suggestion.patch > > > When move an region, it will generate a TRSP first and set the procedure to > the region state node. But if the submit TRSP failed, the procedure cannot be > unset now and the region will stuck in RIT. > hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java > {code:java} > public Future moveAsync(RegionPlan regionPlan) throws > HBaseIOException { > TransitRegionStateProcedure proc = > createMoveRegionProcedure(regionPlan.getRegionInfo(), > regionPlan.getDestination()); > return > ProcedureSyncWait.submitProcedure(master.getMasterProcedureExecutor(), proc); > } > public TransitRegionStateProcedure createMoveRegionProcedure(RegionInfo > regionInfo, > ServerName targetServer) throws HBaseIOException { > RegionStateNode regionNode = > this.regionStates.getRegionStateNode(regionInfo); > if (regionNode == null) { > throw new UnknownRegionException("No RegionStateNode found for " + > regionInfo.getEncodedName() + "(Closed/Deleted?)"); > } > TransitRegionStateProcedure proc; > regionNode.lock(); > try { > preTransitCheck(regionNode, STATES_EXPECTED_ON_UNASSIGN_OR_MOVE); > regionNode.checkOnline(); > proc = TransitRegionStateProcedure.move(getProcedureEnvironment(), > regionInfo, targetServer); > regionNode.setProcedure(proc); > } finally { > regionNode.unlock(); > } > return proc; > } > {code} > hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNode.java > {code:java} > public void setProcedure(TransitRegionStateProcedure proc) { > assert this.procedure == null; > this.procedure = proc; > ritMap.put(regionInfo, this); > } > public void unsetProcedure(TransitRegionStateProcedure proc) { > assert this.procedure == proc; > this.procedure = null; > ritMap.remove(regionInfo, this); > } > {code} > {code:java} > 2020-02-26,13:45:21,344 ERROR > [RpcServer.default.RWQ.Fifo.read.handler=437,queue=5,port=21500] > org.apache.hadoop.hbase.ipc.RpcServer: Unexpected throwable object > java.io.UncheckedIOException: > org.apache.hadoop.hbase.exceptions.TimeoutIOException: Timed out waiting for > lock for row: \x00\x00\x00\x00\x00\x0B\xAB\xD2 in region > 9731aea823e7f83264b14713ae486fb7 > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.update(RegionProcedureStore.java:588) > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.insert(RegionProcedureStore.java:545) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:1042) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.submitProcedure(ProcedureExecutor.java:860) > at > org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitProcedure(ProcedureSyncWait.java:123) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.moveAsync(AssignmentManager.java:657) > at > org.apache.hadoop.hbase.master.HMaster.executeRegionPlansWithThrottling(HMaster.java:1793) > at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1761) > at > org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:654) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:374) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:135) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:352) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Han
[jira] [Resolved] (HBASE-23944) The method setClusterLoad of SimpleLoadBalancer is incorrect when balance by table
[ https://issues.apache.org/jira/browse/HBASE-23944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23944. Resolution: Fixed > The method setClusterLoad of SimpleLoadBalancer is incorrect when balance by > table > --- > > Key: HBASE-23944 > URL: https://issues.apache.org/jira/browse/HBASE-23944 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 2.2.2 >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.4, 2.1.10 > > > now if in parameter clusterLoad is by table, for example > {code:java} > table1=> > server1=>[table1,region1] > server2=>[] > table2=> > server1=>[table2,region1] > server2=>[] > {code} > then, the member variable serverLoadList is: > {code:java} > [{server1, load 1}{server2, load 0}{server1, load 1} {server2, load 0}] > {code} > the cluster will be considered balanced in method overallNeedsBalance -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23739) BoundedRecoveredHFilesOutputSink should read the table descriptor directly
[ https://issues.apache.org/jira/browse/HBASE-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23739. Fix Version/s: 2.3.0 3.0.0 Resolution: Fixed Pushed to branch-2+. > BoundedRecoveredHFilesOutputSink should read the table descriptor directly > -- > > Key: HBASE-23739 > URL: https://issues.apache.org/jira/browse/HBASE-23739 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0 > > > Read from meta or filesystem? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24021) Fail fast when bulkLoadHFiles method catch some IOException
[ https://issues.apache.org/jira/browse/HBASE-24021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24021. Fix Version/s: 2.2.5 2.4.0 2.3.0 3.0.0 Resolution: Fixed Pushed to branch-2.2+. > Fail fast when bulkLoadHFiles method catch some IOException > --- > > Key: HBASE-24021 > URL: https://issues.apache.org/jira/browse/HBASE-24021 > Project: HBase > Issue Type: Improvement > Components: HFile, regionserver >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.4.0, 2.2.5 > > > In production environment, we usually do bulkload huge amount hfile . It > reasonable fail fast when any IOException occur > > hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java > {code:java} > public Map> bulkLoadHFiles(Collection String>> familyPaths, > boolean assignSeqId, BulkLoadListener bulkLoadListener, > boolean copyFile, List clusterIds, boolean replicate) throws > IOException { > .. > try { > this.writeRequestsCount.increment(); > // There possibly was a split that happened between when the split keys > // were gathered and before the HRegion's write lock was taken. We need > // to validate the HFile region before attempting to bulk load all of them > List ioes = new ArrayList<>(); > List> failures = new ArrayList<>(); > for (Pair p : familyPaths) { > byte[] familyName = p.getFirst(); > String path = p.getSecond(); > HStore store = getStore(familyName); > if (store == null) { > IOException ioe = new org.apache.hadoop.hbase.DoNotRetryIOException( > "No such column family " + Bytes.toStringBinary(familyName)); > ioes.add(ioe); > } else { > try { > store.assertBulkLoadHFileOk(new Path(path)); > } catch (WrongRegionException wre) { > // recoverable (file doesn't fit in region) > failures.add(p); > } catch (IOException ioe) { > // unrecoverable (hdfs problem) > ioes.add(ioe); > } > } > } > // validation failed because of some sort of IO problem. > if (ioes.size() != 0) { > IOException e = MultipleIOException.createIOException(ioes); > LOG.error("There were one or more IO errors when checking if the bulk > load is ok.", e); > throw e; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24037) Add ut for root dir and wal root dir are different
[ https://issues.apache.org/jira/browse/HBASE-24037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24037. Fix Version/s: 2.4.0 2.3.0 3.0.0 Resolution: Fixed Pushed to branch-2.3+. > Add ut for root dir and wal root dir are different > -- > > Key: HBASE-24037 > URL: https://issues.apache.org/jira/browse/HBASE-24037 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23949) refactor loadBalancer implements for rsgroup balance by table to achieve overallbalanced
[ https://issues.apache.org/jira/browse/HBASE-23949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23949. Fix Version/s: 2.2.5 2.4.0 2.3.0 3.0.0 Resolution: Fixed Pushed to branch-2.2+. Thanks [~niuyulin] for contributing. > refactor loadBalancer implements for rsgroup balance by table to achieve > overallbalanced > -- > > Key: HBASE-23949 > URL: https://issues.apache.org/jira/browse/HBASE-23949 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.2.0 >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.4.0, 2.2.5 > > > now can not achieve overallbalanced when use rsgroup balancer and by table > is on, > because balance every table actually use the clusterload only contain one > table's load. > we should use clusterload contain all this rsgroup table's load to balance > overall > > hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java > {code:java} > public boolean balance(boolean force) throws IOException { > .. > boolean isByTable = > getConfiguration().getBoolean("hbase.master.loadbalance.bytable", false); > Map>> assignments = > this.assignmentManager.getRegionStates() > .getAssignmentsForBalancer(tableStateManager, > this.serverManager.getOnlineServersList(), > isByTable); > for (Map> serverMap : assignments.values()) { > > serverMap.keySet().removeAll(this.serverManager.getDrainingServersList()); > } > //Give the balancer the current cluster state. > this.balancer.setClusterMetrics(getClusterMetricsWithoutCoprocessor()); > this.balancer.setClusterLoad(assignments); > List plans = new ArrayList<>(); > for (Entry>> e : > assignments.entrySet()) { > List partialPlans = > this.balancer.balanceCluster(e.getKey(), e.getValue()); > if (partialPlans != null) { > plans.addAll(partialPlans); > } > } > {code} > now do refactor: > # add method 'balanceTable' in interface LoadBalancer > # SimpleLoadBalancer and StochasticLoadBalancer do the real 'balanceTable' , > and 'balanceTable' is not support in BaseLoadBalancer and > RSGroupBasedLoadBalancer > # RSGroupBasedLoadBalancer invoke balanceCluster , and pass GroupClusterLoad > to internal balacer by group > # internal balancer balance cluster invoke 'balanceTable' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24363) Fix failed ut TestAssignmentManagerMetrics for branch-2.2
Guanghao Zhang created HBASE-24363: -- Summary: Fix failed ut TestAssignmentManagerMetrics for branch-2.2 Key: HBASE-24363 URL: https://issues.apache.org/jira/browse/HBASE-24363 Project: HBase Issue Type: Bug Affects Versions: 2.2.4 Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24165) maxPoolSize is logged incorrectly in ByteBufferPool
[ https://issues.apache.org/jira/browse/HBASE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24165. Resolution: Fixed > maxPoolSize is logged incorrectly in ByteBufferPool > --- > > Key: HBASE-24165 > URL: https://issues.apache.org/jira/browse/HBASE-24165 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.4 >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Minor > Fix For: 2.2.5 > > > In ByteBufferPool _maxPoolSize_ is converted into byte format, > https://github.com/apache/hbase/blob/a521a80c4b9a8b0749c368d1ff66fea2ed2d77a2/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferPool.java#L85 > > Currently maxPoolSize is logged as below, > 2020-04-10 14:20:56,000 INFO [Time-limited test] io.ByteBufferPool(83): > Created with bufferSize=64 KB and maxPoolSize=320 B -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24381) The Size metrics in Master Webui is wrong if the size is 0
[ https://issues.apache.org/jira/browse/HBASE-24381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24381. Fix Version/s: 2.2.5 2.3.0 3.0.0-alpha-1 Resolution: Fixed Pushed to branch-2.2+. Thanks [~DeanZ] for contributing. > The Size metrics in Master Webui is wrong if the size is 0 > -- > > Key: HBASE-24381 > URL: https://issues.apache.org/jira/browse/HBASE-24381 > Project: HBase > Issue Type: Bug > Components: UI >Affects Versions: 2.2.4 >Reporter: Baiqiang Zhao >Assignee: Baiqiang Zhao >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5 > > Attachments: master-webui-size-wrong.png > > > As shown in attachment, there is no storefiles on the last RS, but the > StoreFile Size is as large as the previous RS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24080) [flakey test] TestRegionReplicaFailover.testSecondaryRegionKill fails.
[ https://issues.apache.org/jira/browse/HBASE-24080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24080. Resolution: Fixed > [flakey test] TestRegionReplicaFailover.testSecondaryRegionKill fails. > -- > > Key: HBASE-24080 > URL: https://issues.apache.org/jira/browse/HBASE-24080 > Project: HBase > Issue Type: Test > Components: read replicas >Affects Versions: 3.0.0-alpha-1, 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5 > > > Run into the following error locally: > {code:java} > --- > Test set: org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover > --- > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 97.391 s <<< > FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover > org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover.testSecondaryRegionKill > Time elapsed: 28.682 s <<< FAILURE! > java.lang.AssertionError: Failed verification of row :0 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.HBaseTestingUtility.verifyNumericRows(HBaseTestingUtility.java:2407) > at > org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover.testSecondaryRegionKill(TestRegionReplicaFailover.java:240) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24363) Fix failed ut TestAssignmentManagerMetrics for branch-2.2
[ https://issues.apache.org/jira/browse/HBASE-24363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24363. Fix Version/s: 2.2.5 Resolution: Fixed Pushed to branch-2.2. Thanks [~meiyi] for reviewing. > Fix failed ut TestAssignmentManagerMetrics for branch-2.2 > - > > Key: HBASE-24363 > URL: https://issues.apache.org/jira/browse/HBASE-24363 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.4 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.5 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24201) Fix CI builds on branch-2.2
[ https://issues.apache.org/jira/browse/HBASE-24201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24201. Resolution: Invalid > Fix CI builds on branch-2.2 > --- > > Key: HBASE-24201 > URL: https://issues.apache.org/jira/browse/HBASE-24201 > Project: HBase > Issue Type: Task > Components: build >Affects Versions: 2.2.5 >Reporter: Nick Dimiduk > Assignee: Guanghao Zhang >Priority: Major > > From a recent [PR > build|https://builds.apache.org/blue/organizations/jenkins/HBase-PreCommit-GitHub-PR/detail/PR-1532/1/pipeline/] > {noformat} > [2020-04-16T18:43:21.548Z] Setting up ruby2.3 (2.3.3-1+deb9u7) ... > [2020-04-16T18:43:21.548Z] Setting up ruby2.3-dev:amd64 (2.3.3-1+deb9u7) ... > [2020-04-16T18:43:21.548Z] Setting up ruby-dev:amd64 (1:2.3.3) ... > [2020-04-16T18:43:21.548Z] Setting up ruby (1:2.3.3) ... > [2020-04-16T18:43:22.261Z] Processing triggers for libc-bin (2.24-11+deb9u3) > ... > [2020-04-16T18:43:22.975Z] Successfully installed rake-13.0.1 > [2020-04-16T18:43:22.975Z] Building native extensions. This could take a > while... > [2020-04-16T18:43:25.277Z] ERROR: Error installing rubocop: > [2020-04-16T18:43:25.277Z]rubocop requires Ruby version >= 2.4.0. > {noformat} > Looks like the Dockerfile on branch-2.2 has bit-rot. I suspect package > versions are partially pinned or not pinned at all: the rubocop version has > incremented by ruby version has not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-20289) Comparator for NormalizationPlan breaks comparator's convention
[ https://issues.apache.org/jira/browse/HBASE-20289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-20289. Fix Version/s: 2.2.5 Resolution: Fixed Pushed to branch-2.2. Thanks [~twyuki] for contributing. > Comparator for NormalizationPlan breaks comparator's convention > --- > > Key: HBASE-20289 > URL: https://issues.apache.org/jira/browse/HBASE-20289 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Yuki Tawara >Assignee: Yuki Tawara >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.5 > > Attachments: HBASE-20289.master.001.patch > > > Comparator must meet the condition: sign(comparator(plan1, plan2)) = - > sign(comparator(plan2, plan1)). > Current implementation breaks above condition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-20289) Comparator for NormalizationPlan breaks comparator's convention
[ https://issues.apache.org/jira/browse/HBASE-20289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-20289: Reopen for backport to branch-2.2. > Comparator for NormalizationPlan breaks comparator's convention > --- > > Key: HBASE-20289 > URL: https://issues.apache.org/jira/browse/HBASE-20289 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Yuki Tawara >Assignee: Yuki Tawara >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.0 > > Attachments: HBASE-20289.master.001.patch > > > Comparator must meet the condition: sign(comparator(plan1, plan2)) = - > sign(comparator(plan2, plan1)). > Current implementation breaks above condition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24165) maxPoolSize is logged incorrectly in ByteBufferPool
[ https://issues.apache.org/jira/browse/HBASE-24165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24165: Reopen as this introduce a findbugs warning. |{color:#00}Result of integer multiplication cast to long in new org.apache.hadoop.hbase.io.ByteBufferPool(int, int, boolean) At ByteBufferPool.java:to long in new org.apache.hadoop.hbase.io.ByteBufferPool(int, int, boolean) At ByteBufferPool.java:[line 84]{color}| > maxPoolSize is logged incorrectly in ByteBufferPool > --- > > Key: HBASE-24165 > URL: https://issues.apache.org/jira/browse/HBASE-24165 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.4 >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Minor > Fix For: 2.2.5 > > > In ByteBufferPool _maxPoolSize_ is converted into byte format, > https://github.com/apache/hbase/blob/a521a80c4b9a8b0749c368d1ff66fea2ed2d77a2/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferPool.java#L85 > > Currently maxPoolSize is logged as below, > 2020-04-10 14:20:56,000 INFO [Time-limited test] io.ByteBufferPool(83): > Created with bufferSize=64 KB and maxPoolSize=320 B -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24080) [flakey test] TestRegionReplicaFailover.testSecondaryRegionKill fails.
[ https://issues.apache.org/jira/browse/HBASE-24080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24080: Reopen for backport this to branch-2.2. > [flakey test] TestRegionReplicaFailover.testSecondaryRegionKill fails. > -- > > Key: HBASE-24080 > URL: https://issues.apache.org/jira/browse/HBASE-24080 > Project: HBase > Issue Type: Test > Components: read replicas >Affects Versions: 3.0.0-alpha-1, 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > > Run into the following error locally: > {code:java} > --- > Test set: org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover > --- > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 97.391 s <<< > FAILURE! - in org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover > org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover.testSecondaryRegionKill > Time elapsed: 28.682 s <<< FAILURE! > java.lang.AssertionError: Failed verification of row :0 > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > org.apache.hadoop.hbase.HBaseTestingUtility.verifyNumericRows(HBaseTestingUtility.java:2407) > at > org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover.testSecondaryRegionKill(TestRegionReplicaFailover.java:240) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24022) Set version as 2.2.5-SNAPSHOT in branch-2.2
Guanghao Zhang created HBASE-24022: -- Summary: Set version as 2.2.5-SNAPSHOT in branch-2.2 Key: HBASE-24022 URL: https://issues.apache.org/jira/browse/HBASE-24022 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24023) Add 2.2.4 to download page
Guanghao Zhang created HBASE-24023: -- Summary: Add 2.2.4 to download page Key: HBASE-24023 URL: https://issues.apache.org/jira/browse/HBASE-24023 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24022) Set version as 2.2.5-SNAPSHOT in branch-2.2
[ https://issues.apache.org/jira/browse/HBASE-24022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24022. Fix Version/s: 2.2.5 Assignee: Guanghao Zhang Resolution: Fixed > Set version as 2.2.5-SNAPSHOT in branch-2.2 > --- > > Key: HBASE-24022 > URL: https://issues.apache.org/jira/browse/HBASE-24022 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.5 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24023) Add 2.2.4 to download page
[ https://issues.apache.org/jira/browse/HBASE-24023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24023. Resolution: Fixed > Add 2.2.4 to download page > -- > > Key: HBASE-24023 > URL: https://issues.apache.org/jira/browse/HBASE-24023 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23922) Release 2.2.4
[ https://issues.apache.org/jira/browse/HBASE-23922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23922. Fix Version/s: 2.2.4 Resolution: Fixed > Release 2.2.4 > - > > Key: HBASE-23922 > URL: https://issues.apache.org/jira/browse/HBASE-23922 > Project: HBase > Issue Type: Umbrella > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.4 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24033) Add ut for loading the corrupt recovered hfiles
Guanghao Zhang created HBASE-24033: -- Summary: Add ut for loading the corrupt recovered hfiles Key: HBASE-24033 URL: https://issues.apache.org/jira/browse/HBASE-24033 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24033) Add ut for loading the corrupt recovered hfiles
[ https://issues.apache.org/jira/browse/HBASE-24033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24033. Resolution: Fixed Pushed to branch-2.3+. Thanks [~zhangduo] for reviewing. > Add ut for loading the corrupt recovered hfiles > --- > > Key: HBASE-24033 > URL: https://issues.apache.org/jira/browse/HBASE-24033 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.4.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23633) Find a way to handle the corrupt recovered hfiles
[ https://issues.apache.org/jira/browse/HBASE-23633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23633. Resolution: Fixed > Find a way to handle the corrupt recovered hfiles > - > > Key: HBASE-23633 > URL: https://issues.apache.org/jira/browse/HBASE-23633 > Project: HBase > Issue Type: Bug > Components: MTTR, wal >Affects Versions: 3.0.0, 2.3.0 > Reporter: Guanghao Zhang >Assignee: Pankaj Kumar >Priority: Critical > Fix For: 3.0.0, 2.3.0, 2.4.0 > > > Copy the comment from PR review. > > If the file is a corrupt HFile, an exception will be thrown here, which will > cause the region to fail to open. > Maybe we can add a new parameter to control whether to skip the exception, > similar to recover edits which has a parameter > "hbase.hregion.edits.replay.skip.errors"; > > Regions that can't be opened because of detached References or corrupt hfiles > are a fact-of-life. We need work on this issue. This will be a new variant on > the problem -- i.e. bad recovered hfiles. > On adding a config to ignore bad files and just open, thats a bit dangerous > as per @infraio as it could mean silent data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23741) Data loss when WAL split to HFile enabled
[ https://issues.apache.org/jira/browse/HBASE-23741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23741. Resolution: Fixed Pushed to branch-2.3+. Thanks [~zhangduo] for reviewing. > Data loss when WAL split to HFile enabled > - > > Key: HBASE-23741 > URL: https://issues.apache.org/jira/browse/HBASE-23741 > Project: HBase > Issue Type: Bug > Components: MTTR >Affects Versions: 3.0.0, 2.3.0 >Reporter: Pankaj Kumar > Assignee: Guanghao Zhang >Priority: Blocker > Fix For: 3.0.0, 2.3.0, 2.4.0 > > > Very simple steps as below, > 1. Create table with 1 region > 2. Insert 1 record > 3. Flush the table > 4. Scan table and observe timestamp of the inserted row > 5. Insert same row key with same timestamp as previously inserted but with > different value > 6. Kill -9 RS where table region is online > 7. Start RS > Scan the table and check the result, latest cell must be returned. > Thanks [~sreenivasulureddy] for finding this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24037) Add ut for root dir and wal root dir are different
Guanghao Zhang created HBASE-24037: -- Summary: Add ut for root dir and wal root dir are different Key: HBASE-24037 URL: https://issues.apache.org/jira/browse/HBASE-24037 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24344) Release 2.2.5
Guanghao Zhang created HBASE-24344: -- Summary: Release 2.2.5 Key: HBASE-24344 URL: https://issues.apache.org/jira/browse/HBASE-24344 Project: HBase Issue Type: Umbrella Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24411) Set version to 2.2.5 in branch-2.2 for first RC of 2.2.5
Guanghao Zhang created HBASE-24411: -- Summary: Set version to 2.2.5 in branch-2.2 for first RC of 2.2.5 Key: HBASE-24411 URL: https://issues.apache.org/jira/browse/HBASE-24411 Project: HBase Issue Type: Sub-task Affects Versions: 2.2.5 Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24410) Generate CHANGES.md and RELEASENOTES.md for 2.2.5
Guanghao Zhang created HBASE-24410: -- Summary: Generate CHANGES.md and RELEASENOTES.md for 2.2.5 Key: HBASE-24410 URL: https://issues.apache.org/jira/browse/HBASE-24410 Project: HBase Issue Type: Sub-task Affects Versions: 2.2.5 Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24410) Generate CHANGES.md and RELEASENOTES.md for 2.2.5
[ https://issues.apache.org/jira/browse/HBASE-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24410. Fix Version/s: 2.2.5 Resolution: Fixed > Generate CHANGES.md and RELEASENOTES.md for 2.2.5 > - > > Key: HBASE-24410 > URL: https://issues.apache.org/jira/browse/HBASE-24410 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.2.5 > Reporter: Guanghao Zhang >Priority: Major > Fix For: 2.2.5 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23771) [Flakey Tests] Test TestSplitTransactionOnCluster Again
[ https://issues.apache.org/jira/browse/HBASE-23771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-23771: Reopen for backport to branch-2.2. > [Flakey Tests] Test TestSplitTransactionOnCluster Again > --- > > Key: HBASE-23771 > URL: https://issues.apache.org/jira/browse/HBASE-23771 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > Attachments: > 0001-HBASE-23771-Flakey-Tests-Test-TestSplitTransactionOn.patch, Screen Shot > 2020-01-31 at 8.37.13 AM.png > > > Parent fix had the test failures in GCE go from 35% to 4%. Let me see if can > clear the remaining fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23771) [Flakey Tests] Test TestSplitTransactionOnCluster Again
[ https://issues.apache.org/jira/browse/HBASE-23771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23771. Fix Version/s: 2.2.6 Resolution: Fixed Pushed to branch-2.2. > [Flakey Tests] Test TestSplitTransactionOnCluster Again > --- > > Key: HBASE-23771 > URL: https://issues.apache.org/jira/browse/HBASE-23771 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6 > > Attachments: > 0001-HBASE-23771-Flakey-Tests-Test-TestSplitTransactionOn.patch, Screen Shot > 2020-01-31 at 8.37.13 AM.png > > > Parent fix had the test failures in GCE go from 35% to 4%. Let me see if can > clear the remaining fails. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24410) Generate CHANGES.md and RELEASENOTES.md for 2.2.5
[ https://issues.apache.org/jira/browse/HBASE-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24410: > Generate CHANGES.md and RELEASENOTES.md for 2.2.5 > - > > Key: HBASE-24410 > URL: https://issues.apache.org/jira/browse/HBASE-24410 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.2.5 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.5 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24115) Relocate test-only REST "client" from src/ to test/ and mark Private
[ https://issues.apache.org/jira/browse/HBASE-24115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24115: Reopen to add release note. > Relocate test-only REST "client" from src/ to test/ and mark Private > > > Key: HBASE-24115 > URL: https://issues.apache.org/jira/browse/HBASE-24115 > Project: HBase > Issue Type: Test > Components: REST, security >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 1.3.7, 1.7.0, 2.4.0, 2.1.10, > 1.4.14, 2.2.5 > > > Relocate test-only REST "client" from src/ to test/ and annotate as Private. > The classes o.a.h.h.rest.Remote* were developed to facilitate REST unit tests > and incorrectly committed to src/ . > Although this "breaks" compatibility by moving public classes to test jar and > marking them private, no attention has been paid to these classes with > respect to performance, convenience, or security. Consensus from various > discussions over the years is to move them to test/ as was intent of the > original committer, but misplaced by the same individual. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24115) Relocate test-only REST "client" from src/ to test/ and mark Private
[ https://issues.apache.org/jira/browse/HBASE-24115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24115. Resolution: Fixed > Relocate test-only REST "client" from src/ to test/ and mark Private > > > Key: HBASE-24115 > URL: https://issues.apache.org/jira/browse/HBASE-24115 > Project: HBase > Issue Type: Test > Components: REST, security >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 1.3.7, 1.7.0, 2.4.0, 2.1.10, > 1.4.14, 2.2.5 > > > Relocate test-only REST "client" from src/ to test/ and annotate as Private. > The classes o.a.h.h.rest.Remote* were developed to facilitate REST unit tests > and incorrectly committed to src/ . > Although this "breaks" compatibility by moving public classes to test jar and > marking them private, no attention has been paid to these classes with > respect to performance, convenience, or security. Consensus from various > discussions over the years is to move them to test/ as was intent of the > original committer, but misplaced by the same individual. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24980) Fix dead links in HBase book
[ https://issues.apache.org/jira/browse/HBASE-24980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24980. Fix Version/s: (was: 2.3.2) 3.0.0-alpha-1 Resolution: Fixed Pushed to master branch. Thanks [~echohlne] for contributing. > Fix dead links in HBase book > > > Key: HBASE-24980 > URL: https://issues.apache.org/jira/browse/HBASE-24980 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 2.3.0 >Reporter: echohlne >Assignee: echohlne >Priority: Major > Fix For: 3.0.0-alpha-1 > > > 1. > -[https://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html|https://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html-]- > => > [https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html] > 2. -[https://vimeo.com/26804675|https://vimeo.com/26804675-]- => > [https://www.youtube.com/watch?v=DdGKAorSSZ0] > 3. > -[http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop|http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop-]- > has been invalid and cannot be found in other website, just remove it. > 4. > -[https://hadoop.apache.org/core/docs/stable/api/org/apache/hadoop/metrics/package-summary.html]- > => > [https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/metrics2/package-summary.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24656) [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart
[ https://issues.apache.org/jira/browse/HBASE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24656. Fix Version/s: 2.2.6 Resolution: Fixed Cherry-picked to branch-2.2. > [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart > --- > > Key: HBASE-24656 > URL: https://issues.apache.org/jira/browse/HBASE-24656 > Project: HBase > Issue Type: Bug >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 2.2.6, 2.3.0 > > > org.apache.hadoop.hbase.master.TestMasterNoCluster.testStopDuringStart is > (only) flakey on branch-2 currently. Fails here: > Error Message > KeeperErrorCode = Directory not empty for /hbase/backup-masters > Stacktrace > org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = > Directory not empty for /hbase/backup-masters > at > org.apache.hadoop.hbase.master.TestMasterNoCluster.tearDown(TestMasterNoCluster.java:121) > I can see the zk events in teardown as we purge children as part of cleanup. > Can also see that the backup master registers later. Other than that, log is > opaque on why the teardown is failing. This is just clean up so adding in > retry to see if that helps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24656) [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart
[ https://issues.apache.org/jira/browse/HBASE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24656: Reopen for branch-2.2. > [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart > --- > > Key: HBASE-24656 > URL: https://issues.apache.org/jira/browse/HBASE-24656 > Project: HBase > Issue Type: Bug >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Fix For: 2.3.0 > > > org.apache.hadoop.hbase.master.TestMasterNoCluster.testStopDuringStart is > (only) flakey on branch-2 currently. Fails here: > Error Message > KeeperErrorCode = Directory not empty for /hbase/backup-masters > Stacktrace > org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = > Directory not empty for /hbase/backup-masters > at > org.apache.hadoop.hbase.master.TestMasterNoCluster.tearDown(TestMasterNoCluster.java:121) > I can see the zk events in teardown as we purge children as part of cleanup. > Can also see that the backup master registers later. Other than that, log is > opaque on why the teardown is failing. This is just clean up so adding in > retry to see if that helps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24831) Avoid invoke Counter using reflection in SnapshotInputFormat
[ https://issues.apache.org/jira/browse/HBASE-24831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24831. Fix Version/s: 2.3.2 2.4.0 3.0.0-alpha-1 Resolution: Fixed Pushed to branch-2.3+. Thanks [~chenyechao] for contributing. > Avoid invoke Counter using reflection in SnapshotInputFormat > - > > Key: HBASE-24831 > URL: https://issues.apache.org/jira/browse/HBASE-24831 > Project: HBase > Issue Type: Improvement >Reporter: Yechao Chen >Assignee: Yechao Chen >Priority: Major > Labels: Performance, mapreduce, snapshot > Fix For: 3.0.0-alpha-1, 2.4.0, 2.3.2 > > > In TableRecordReaderImpl we invoke Counter increment by reflection > This will be called nextKeyValue() in TableSnapshotInputFormat > reflection invoke is very slower than normal method call > we can avoid these to improve the read performance -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24973) Remove read point parameter in method StoreFlush#performFlush and StoreFlush#createScanner
[ https://issues.apache.org/jira/browse/HBASE-24973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24973. Fix Version/s: 2.4.0 3.0.0-alpha-1 Resolution: Fixed Pushed to branch-2+. Thanks [~yuqi] for contributing. > Remove read point parameter in method StoreFlush#performFlush and > StoreFlush#createScanner > -- > > Key: HBASE-24973 > URL: https://issues.apache.org/jira/browse/HBASE-24973 > Project: HBase > Issue Type: Improvement >Reporter: yuqi >Assignee: yuqi >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Currently, read point parameter in method StoreFlush#performFlush is useless > and can be safely removed. > and then method StoreFlush#createScanner can also remove this parameter > See below > {code:java} > // Some comments here > /** >* Performs memstore flush, writing data from scanner into sink. >* @param scanner Scanner to get data from. >* @param sink Sink to write data to. Could be StoreFile.Writer. >* @param smallestReadPoint Smallest read point used for the flush. >* @param throughputController A controller to avoid flush too fast >*/ > protected void performFlush(InternalScanner scanner, CellSink sink, > long smallestReadPoint, ThroughputController throughputController) > throws IOException > {code} > Parameter smallestReadPoint is not used in this method. When > `smallestReadPoint` is removed, inner method `createScanner` can remove this > necessary parameter too -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24760) Add a config hbase.rsgroup.fallback.enable for RSGroup fallback feature
[ https://issues.apache.org/jira/browse/HBASE-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24760. Fix Version/s: 2.4.0 Resolution: Fixed Pushed to branch-2+. Thanks [~Ddupg] for contributing. > Add a config hbase.rsgroup.fallback.enable for RSGroup fallback feature > --- > > Key: HBASE-24760 > URL: https://issues.apache.org/jira/browse/HBASE-24760 > Project: HBase > Issue Type: New Feature > Components: rsgroup >Affects Versions: 3.0.0-alpha-1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > In HBASE-22738 we allow tables fallback to specific rs groups, If there is no > online servers in the table's rsgroup. > -But for system tables, if there is no specified fallback rsgroup or the > servers in the fallback rsgroup all went down, It is necessary to allow > system tables fallback to any rsgroup in order to keey available at all > times.- > For Availability, refactor design of rsgroup fallback, finally only > introduced one config property `hbase.rsgroup.fallback.enable`, allow all > table, whether or not system tables, fallback to the default rsgroup first, > then fallback to any group if no online servers in default rsgroup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24913) Refactor TestJMXConnectorServer
[ https://issues.apache.org/jira/browse/HBASE-24913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24913. Fix Version/s: 2.3.2 2.2.7 Resolution: Fixed Pushed to branch-2.2+. Thanks [~Ddupg] for contributing. > Refactor TestJMXConnectorServer > --- > > Key: HBASE-24913 > URL: https://issues.apache.org/jira/browse/HBASE-24913 > Project: HBase > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha-1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.2 > > > Two optimization points for TestJMXConnectorServer in this issue: > # Just run cluster once, not once per test case. > # Use random free port to run ConnectorServer, avoid specifying a fixed port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24998) Introduce a ReplicationSourceOverallController interface and decouple ReplicationSourceManager and ReplicationSource
Guanghao Zhang created HBASE-24998: -- Summary: Introduce a ReplicationSourceOverallController interface and decouple ReplicationSourceManager and ReplicationSource Key: HBASE-24998 URL: https://issues.apache.org/jira/browse/HBASE-24998 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25035) Add 2.2.6 to download page
Guanghao Zhang created HBASE-25035: -- Summary: Add 2.2.6 to download page Key: HBASE-25035 URL: https://issues.apache.org/jira/browse/HBASE-25035 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25036) Set version as 2.2.7-SNAPSHOT in branch-2.2
Guanghao Zhang created HBASE-25036: -- Summary: Set version as 2.2.7-SNAPSHOT in branch-2.2 Key: HBASE-25036 URL: https://issues.apache.org/jira/browse/HBASE-25036 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25014) ScheduledChore is never triggered when initalDelay > 1.5*period
[ https://issues.apache.org/jira/browse/HBASE-25014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25014. Fix Version/s: 2.2.7 2.4.0 2.3.3 Resolution: Fixed > ScheduledChore is never triggered when initalDelay > 1.5*period > --- > > Key: HBASE-25014 > URL: https://issues.apache.org/jira/browse/HBASE-25014 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > > In our recent tests, ScheduledChore is never triggered when initalDelay > > 1.5*period. > The cause of the bug is the following: > The trigger time for a ScheduleChore must be within an acceptable time window > that is 1.5 * period. see > [here|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L234] > timeOfLastRun and timeOfThisRun are two variables that record two adjacent > trigger time. [The first initialization of > timeOfThisRun|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L273] > is when the ScheduleChore is created, it's not a real trigger time. > If we set initialDelay > 1.5 period , after initialDelay, the first time when > chore is triggered has exceeded the allowed window. Then [cancel the chore > and schedule it > again|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java#L176]. > So it's stuck in loop when initialDelay > 1.5 period : > 1. init timeOfThisRun at a wrong time. > 2. wait initalDelay > 3. chore trigger, but exceeded the allowed window. > 4. cancel chore and schedule it again > 5. go step 1. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25009) Hbck chore logs wrong message when loading regions from RS report
[ https://issues.apache.org/jira/browse/HBASE-25009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25009. Fix Version/s: 2.2.7 2.4.0 2.3.3 3.0.0-alpha-1 Resolution: Fixed Pushed to branch-2.2+. Thanks [~arshad.mohammad] for contributing. > Hbck chore logs wrong message when loading regions from RS report > - > > Key: HBASE-25009 > URL: https://issues.apache.org/jira/browse/HBASE-25009 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.3.1 >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7 > > > {code:java} > LOG.info("Loaded {} regions from {} regionservers' reports and found {} > orphan regions", > numRegions, rsReports.size(), orphanRegionsOnFS.size()); > {code} > In above log message orphanRegionsOnFS.size() should be replaced with > orphanRegionsOnRS.size() as the regions are loaded from RS not form FS -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25012) HBASE-24359 causes replication missed log of some RemoteException
[ https://issues.apache.org/jira/browse/HBASE-25012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25012. Fix Version/s: 2.4.0 2.3.3 Resolution: Fixed Pushed to branch-2.3+. Thanks [~Ddupg] for contributing. > HBASE-24359 causes replication missed log of some RemoteException > - > > Key: HBASE-25012 > URL: https://issues.apache.org/jira/browse/HBASE-25012 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.3.1 >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0 > > Attachments: image-2020-09-11-14-30-27-898.png > > > HBASE-24359 broken the logic of handling exception. In branch2, it even > causes some RemoteException log missed. > [File > changed|[https://github.com/apache/hbase/pull/1855/files#diff-1e3f171b19474698601a0752b618af0eL435]] > in branch2. > !image-2020-09-11-14-30-27-898.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24982) Disassemble the method replicateWALEntry from AdminService to a new interface ReplicationServerService
[ https://issues.apache.org/jira/browse/HBASE-24982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24982. Resolution: Fixed Merged. Thanks [~Ddupg] for contributing. > Disassemble the method replicateWALEntry from AdminService to a new interface > ReplicationServerService > -- > > Key: HBASE-24982 > URL: https://issues.apache.org/jira/browse/HBASE-24982 > Project: HBase > Issue Type: Sub-task > Components: Replication >Reporter: Sun Xin >Assignee: Sun Xin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25177) Try create table with 100 regions for branch-2.2 nightly job's hadoop integration test
[ https://issues.apache.org/jira/browse/HBASE-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25177. Resolution: Won't Fix > Try create table with 100 regions for branch-2.2 nightly job's hadoop > integration test > -- > > Key: HBASE-25177 > URL: https://issues.apache.org/jira/browse/HBASE-25177 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > > It still failed now. > [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/88/execution/node/171/log/] > > [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/88//artifact/output-integration/hadoop-2.log] > > It failed when create table with 1000 regions. And not import the example TSV > to HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25177) Try create table with 100 regions for branch-2.2 nightly job's hadoop integration test
Guanghao Zhang created HBASE-25177: -- Summary: Try create table with 100 regions for branch-2.2 nightly job's hadoop integration test Key: HBASE-25177 URL: https://issues.apache.org/jira/browse/HBASE-25177 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang It still failed now. [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/88/execution/node/171/log/] [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/88//artifact/output-integration/hadoop-2.log] It failed when create table with 1000 regions. And not import the example TSV to HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25178) Fix the LICENSE error when branch-2.2 build with hadoop 3.3.0
Guanghao Zhang created HBASE-25178: -- Summary: Fix the LICENSE error when branch-2.2 build with hadoop 3.3.0 Key: HBASE-25178 URL: https://issues.apache.org/jira/browse/HBASE-25178 Project: HBase Issue Type: Bug Affects Versions: 2.2.6 Reporter: Guanghao Zhang See [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/88/execution/node/163/log/] It will fail when run "mvn clean install -DskipTests -DHBasePatchProcess -Dhadoop-three.version=3.3.0 -Dhadoop.profile=3.0". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25172) No need timelineservice for branch-2.2 nightly job's hadoop integration test
[ https://issues.apache.org/jira/browse/HBASE-25172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25172. Resolution: Fixed > No need timelineservice for branch-2.2 nightly job's hadoop integration test > > > Key: HBASE-25172 > URL: https://issues.apache.org/jira/browse/HBASE-25172 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.7 > > > [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/86/execution/node/171/log/] > > > /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_branch-2.2/component/dev-support/hbase_nightly_pseudo-distributed-test.sh > --single-process --working-dir output-integration/hadoop-2 > --hbase-client-install hbase-client hbase-install hadoop-2/bin/hadoop > {color:#ff}hadoop-2/share/hadoop/yarn/timelineservice{color} > hadoop-2/share/hadoop/yarn/test/hadoop-yarn-server-tests-2.8.5-tests.jar > hadoop-2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.5-tests.jar > hadoop-2/bin/mapred > > branch-2.2 still use hadoop 2.8.5 and hadoop 2.8.5 doesn't have > timelineservice. The dev-support/hbase_nightly_pseudo-distributed-test.sh not > consider this timelineservice and only consider 5 paramerters. But > branch-2.3+ use 2.10.x hadoop, so they consider 6 parameters. > > And for hadoop-3, the timelineservice is not used, too. See > [https://github.com/apache/hbase/blob/master/dev-support/hbase_nightly_pseudo-distributed-test.sh#L286] > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25178) Remove the hadoop 3.3.0 personality hadoopcheck for branch-2.2/branch-2.3
[ https://issues.apache.org/jira/browse/HBASE-25178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25178. Resolution: Duplicate Already fixed by HBASE-25144. > Remove the hadoop 3.3.0 personality hadoopcheck for branch-2.2/branch-2.3 > - > > Key: HBASE-25178 > URL: https://issues.apache.org/jira/browse/HBASE-25178 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.6 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > > For branch-2.2, see > [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/88/execution/node/163/log/] > It will fail when run "mvn clean install -DskipTests -DHBasePatchProcess > -Dhadoop-three.version=3.3.0 -Dhadoop.profile=3.0". > > For branch-2.3, see HBASE-23834. HBase failed to start on hadoop 3.3.0 > because the jetty problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25200) Try enlarge the flaky test timeout for branch-2.2
Guanghao Zhang created HBASE-25200: -- Summary: Try enlarge the flaky test timeout for branch-2.2 Key: HBASE-25200 URL: https://issues.apache.org/jira/browse/HBASE-25200 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Now there are too many flaky tests to run. And the flaky test job cannot finished. Then these tests will be marked to flaky again. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25204) Nightly job failed as the name of jdk and maven changed
[ https://issues.apache.org/jira/browse/HBASE-25204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-25204. Fix Version/s: 2.2.7 1.4.14 2.4.0 1.7.0 2.3.3 3.0.0-alpha-1 Resolution: Fixed Pushed to all active branchs. Thanks [~zhangduo] for reviewing. > Nightly job failed as the name of jdk and maven changed > > > Key: HBASE-25204 > URL: https://issues.apache.org/jira/browse/HBASE-25204 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 1.4.14, 2.2.7 > > > See > [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console] > [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console] > > org.codehaus.groovy.control.MultipleCompilationErrorsException: startup > failed: WorkflowScript: 508: Tool type "maven" does not have an install of > "Maven (latest)" configured - did you mean "maven_latest"? @ line 508, column > 19. maven 'Maven (latest)' ^ WorkflowScript: 510: Tool type "jdk" does not > have an install of "JDK 1.8 (latest)" configured - did you mean > "jdk_1.8_latest"? @ line 510, column 17. jdk "JDK 1.8 (latest)" > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25172) No need timelineservice for branch-2.2 nightly job's hadoop integration test
Guanghao Zhang created HBASE-25172: -- Summary: No need timelineservice for branch-2.2 nightly job's hadoop integration test Key: HBASE-25172 URL: https://issues.apache.org/jira/browse/HBASE-25172 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/86/execution/node/171/log/] /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_branch-2.2/component/dev-support/hbase_nightly_pseudo-distributed-test.sh --single-process --working-dir output-integration/hadoop-2 --hbase-client-install hbase-client hbase-install hadoop-2/bin/hadoop hadoop-2/share/hadoop/yarn/timelineservice hadoop-2/share/hadoop/yarn/test/hadoop-yarn-server-tests-2.8.5-tests.jar hadoop-2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.5-tests.jar hadoop-2/bin/mapred branch-2.2 still use hadoop 2.8.5 and doesn't have timelineservice. The dev-support/hbase_nightly_pseudo-distributed-test.sh not consider this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25204) Nightly job failed as the name of jdk and maven changed
Guanghao Zhang created HBASE-25204: -- Summary: Nightly job failed as the name of jdk and maven changed Key: HBASE-25204 URL: https://issues.apache.org/jira/browse/HBASE-25204 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang See [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.3/85/console] [https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.2/103/console] org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: WorkflowScript: 508: Tool type "maven" does not have an install of "Maven (latest)" configured - did you mean "maven_latest"? @ line 508, column 19. maven 'Maven (latest)' ^ WorkflowScript: 510: Tool type "jdk" does not have an install of "JDK 1.8 (latest)" configured - did you mean "jdk_1.8_latest"? @ line 510, column 17. jdk "JDK 1.8 (latest)" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23987) NettyRpcClientConfigHelper will not share event loop by default which is incorrect
[ https://issues.apache.org/jira/browse/HBASE-23987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23987. Fix Version/s: 2.2.6 Resolution: Fixed > NettyRpcClientConfigHelper will not share event loop by default which is > incorrect > -- > > Key: HBASE-23987 > URL: https://issues.apache.org/jira/browse/HBASE-23987 > Project: HBase > Issue Type: Bug > Components: Client, rpc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.6, 2.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24870) Ignore TestAsyncTableRSCrashPublish
[ https://issues.apache.org/jira/browse/HBASE-24870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24870. Fix Version/s: 2.2.6 Resolution: Fixed > Ignore TestAsyncTableRSCrashPublish > --- > > Key: HBASE-24870 > URL: https://issues.apache.org/jira/browse/HBASE-24870 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > [ERROR] Failures: > [ERROR] TestAsyncTableRSCrashPublish.test:94 Waiting timed out after [60,000] > msec > > I meet this failure many times when runAllTests. And other developers meet > this too when vote RC. Let's ignore this first and enable this after parent > issue resolved. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24897) RegionReplicaFlushHandler should handle NoServerForRegionException to avoid aborting RegionServer
[ https://issues.apache.org/jira/browse/HBASE-24897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24897. Fix Version/s: 2.2.6 Resolution: Fixed > RegionReplicaFlushHandler should handle NoServerForRegionException to avoid > aborting RegionServer > - > > Key: HBASE-24897 > URL: https://issues.apache.org/jira/browse/HBASE-24897 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > Debug flaky test TestRegionReplicaReplicationEndpoint, I found the RS aborted > because RegionReplicaFlushHandler flush failed. When create a new table with > region replica, the assign order may be: > # assign 0002 replica region and trigger primary region flush. > # assign 0001 replica region and trigger primary region flush. > # assign primary region. > But the primary region flush may failed because the primary region not opened > now. So it may abort the RS.. > > {code:java} > 2020-08-18 16:56:30,041 INFO > [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] > handler.AssignRegionHandler(141): Opened > testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354. > 2020-08-18 16:56:30,558 INFO > [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] > handler.AssignRegionHandler(141): Opened > testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140. > 2020-08-18 16:56:31,192 INFO > [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] > handler.AssignRegionHandler(141): Opened > testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2. > 2020-08-18 16:58:53,857 ERROR > [RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] > helpers.MarkerIgnoringBase(159): * ABORTING region server > hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception > was thrown * > org.apache.hadoop.hbase.client.NoServerForRegionException: No server address > listed in hbase:meta for region > testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb. > containing row > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140) > at > org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147) > at > org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98) > at > org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84) > at > org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at > org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129) > at > org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > I thought the fix should be assign primary region firstly when enable region > replica featue. Will check the implmenation of region replica. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24881) Fix flaky TestMasterAbortAndRSGotKilled for branch-2.2
[ https://issues.apache.org/jira/browse/HBASE-24881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24881. Fix Version/s: 2.2.6 Resolution: Fixed > Fix flaky TestMasterAbortAndRSGotKilled for branch-2.2 > -- > > Key: HBASE-24881 > URL: https://issues.apache.org/jira/browse/HBASE-24881 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > I meet this problem on branch-2.2 too. This case happened because the > DelayCloseCP. The event execute order is: > # Close regiong. But because the DelayCloseCP, it will close after 10 > seconds. > # Finish ut and shutdown cluster. > # Shutdown master. > # Shutdown RS. Call waitOnAllRegionsToClose method. But abortRequested is > false now. > # Close region and failed because master is down and report master error. > Then abort RegionServer and set abortRequested to ture. > # waitOnAllRegionsToClose hanged because the online regions cannot be empty. > > waitOnAllRegionsToClose(final boolean abort) already consider the abort case > but the problem is abortRequested is false when call this method. I thought > the fix should be that keep to check the abortRequested in > waitOnAllRegionsToClose method internal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24689) Generate CHANGES.md and RELEASENOTES.md for 2.2.6
[ https://issues.apache.org/jira/browse/HBASE-24689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24689: > Generate CHANGES.md and RELEASENOTES.md for 2.2.6 > - > > Key: HBASE-24689 > URL: https://issues.apache.org/jira/browse/HBASE-24689 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23814) Add null checks and logging to misc set of tests
[ https://issues.apache.org/jira/browse/HBASE-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-23814: Reopen for cherry-pick to branch-2.2. > Add null checks and logging to misc set of tests > > > Key: HBASE-23814 > URL: https://issues.apache.org/jira/browse/HBASE-23814 > Project: HBase > Issue Type: Test >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Trivial > Fix For: 3.0.0-alpha-1, 2.3.0 > > > I've been studying unit tests of late. A few are failing but then the output > is missing a detail or shutdown complains of NPE because startup didn't > succeed. > Here are super minor items I've been carrying around that I'd like to land. > They do not change the function of tests (there is an attempt at a fix of > TestLogsCleaner). > * TestFullLogReconstruction log the server we've chosen to expire and then > note where we starting counting rows > * TestAsyncTableScanException use a define for row counts; count 100 instead > of 1000 and see if helps > * TestRawAsyncTableLimitedScanWithFilter check connection was made before > closing it in tearDown > * TestLogsCleaner use single mod time. Make it for sure less than now in case > test runs all in the same millisecond (would cause test fail) > * TestReplicationBase test table is non-null before closing in tearDown -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23814) Add null checks and logging to misc set of tests
[ https://issues.apache.org/jira/browse/HBASE-23814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-23814. Fix Version/s: 2.2.6 Resolution: Fixed > Add null checks and logging to misc set of tests > > > Key: HBASE-23814 > URL: https://issues.apache.org/jira/browse/HBASE-23814 > Project: HBase > Issue Type: Test >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Trivial > Fix For: 3.0.0-alpha-1, 2.2.6, 2.3.0 > > > I've been studying unit tests of late. A few are failing but then the output > is missing a detail or shutdown complains of NPE because startup didn't > succeed. > Here are super minor items I've been carrying around that I'd like to land. > They do not change the function of tests (there is an attempt at a fix of > TestLogsCleaner). > * TestFullLogReconstruction log the server we've chosen to expire and then > note where we starting counting rows > * TestAsyncTableScanException use a define for row counts; count 100 instead > of 1000 and see if helps > * TestRawAsyncTableLimitedScanWithFilter check connection was made before > closing it in tearDown > * TestLogsCleaner use single mod time. Make it for sure less than now in case > test runs all in the same millisecond (would cause test fail) > * TestReplicationBase test table is non-null before closing in tearDown -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24928) balanceRSGroup should skip generating balance plan for disabled table and splitParent region
[ https://issues.apache.org/jira/browse/HBASE-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24928. Fix Version/s: 2.3.2 2.2.6 Resolution: Fixed > balanceRSGroup should skip generating balance plan for disabled table and > splitParent region > > > Key: HBASE-24928 > URL: https://issues.apache.org/jira/browse/HBASE-24928 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: niuyulin >Assignee: niuyulin >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.6, 2.3.2 > > > now ,we generate balance plan for disabled tables, which is useless > {code:java} > 2020-08-20,20:47:54,702 WARN > [RpcServer.default.RWQ.Fifo.read.handler=310,queue=6,port=22500] > org.apache.hadoop.hbase.master.HMaster: Failed balance plan: > hri=aa325467924edc865ab2ef6d82f9e2a7, > source=tj1-hadoop-staging-st02.kscn,22600,1572403947348, destination=, just > skip it > org.apache.hadoop.hbase.client.DoNotRetryRegionException: Unexpected state > for rit=CLOSED, location=tj1-hadoop-staging-st02.kscn,22600,1572403947348, > table=galaxysds:sds_staging_258z, region=aa325467924edc865ab2ef6d82f9e2a7 > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:580) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:635) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.moveAsync(AssignmentManager.java:652) > at > org.apache.hadoop.hbase.master.HMaster.executeRegionPlansWithThrottling(HMaster.java:1776) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.balanceRSGroup(RSGroupAdminServer.java:486) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.balanceRSGroup(RSGroupAdminEndpoint.java:293) > at > org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:13890) > at > org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:908) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:135) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24948) Reduce the resource of TestReplicationBase
[ https://issues.apache.org/jira/browse/HBASE-24948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24948. Fix Version/s: 2.2.6 Assignee: Guanghao Zhang Resolution: Fixed > Reduce the resource of TestReplicationBase > --- > > Key: HBASE-24948 > URL: https://issues.apache.org/jira/browse/HBASE-24948 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24946) Remove the metrics assert in TestClusterRestartFailover
[ https://issues.apache.org/jira/browse/HBASE-24946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24946. Fix Version/s: 2.2.6 Resolution: Fixed > Remove the metrics assert in TestClusterRestartFailover > --- > > Key: HBASE-24946 > URL: https://issues.apache.org/jira/browse/HBASE-24946 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > MetricsMasterSource masterSource = > UTIL.getHBaseCluster().getMaster().getMasterMetrics() > .getMetricsSource(); > metricsHelper.assertCounter(MetricsMasterSource.SERVER_CRASH_METRIC_PREFIX+"SubmittedCount", > 4, masterSource); > > Introduced by HBASE-24199. But flaky now as this unit test will restart all > clusters. Meanwhile, this metric already tested by TestMasterMetrics. I plan > to remove this assert for branch-2.2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24948) Reduce the resource of TestReplicationBase
Guanghao Zhang created HBASE-24948: -- Summary: Reduce the resource of TestReplicationBase Key: HBASE-24948 URL: https://issues.apache.org/jira/browse/HBASE-24948 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24895) Speed up TestFromClientSide3 by reduce the table regions number
Guanghao Zhang created HBASE-24895: -- Summary: Speed up TestFromClientSide3 by reduce the table regions number Key: HBASE-24895 URL: https://issues.apache.org/jira/browse/HBASE-24895 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang [https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/52/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3//] |[testHTableExistsMethodMultipleRegionsMultipleGets|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/52/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3//testHTableExistsMethodMultipleRegionsMultipleGets]|2 min 58 sec|Regression| |[testHTableExistsMethodMultipleRegionsSingleGet|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/52/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3//testHTableExistsMethodMultipleRegionsSingleGet]|4 min 20 sec|Passed| It take too many time and timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22548) Split TestAdmin1
[ https://issues.apache.org/jira/browse/HBASE-22548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-22548. Fix Version/s: 2.2.6 Resolution: Fixed > Split TestAdmin1 > > > Key: HBASE-22548 > URL: https://issues.apache.org/jira/browse/HBASE-22548 > Project: HBase > Issue Type: Test > Components: Admin, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.6, 2.3.0 > > Attachments: HBASE-22548-branch-2-v1.patch, HBASE-22548-branch-2.patch > > > It is too large and easy to timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24897) Fix flaky test TestRegionReplicaReplicationEndpoint
Guanghao Zhang created HBASE-24897: -- Summary: Fix flaky test TestRegionReplicaReplicationEndpoint Key: HBASE-24897 URL: https://issues.apache.org/jira/browse/HBASE-24897 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Debug this unti test, I found the RS aborted because RegionReplicaFlushHandler flush failed. When create a new table with region replica, the assign order may be: # assign 0002 replica region and trigger primary region flush. # assign 0001 replica region and trigger primary region flush. # assign primary region. But the primary region flush may failed because the primary region not opened now. So it may abort the RS.. {code:java} 2020-08-18 16:56:30,041 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354. 2020-08-18 16:56:30,558 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140. 2020-08-18 16:56:31,192 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2. 2020-08-18 16:58:53,857 ERROR [RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] helpers.MarkerIgnoringBase(159): * ABORTING region server hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception was thrown * org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in hbase:meta for region testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb. containing row at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926) at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784) at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140) at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147) at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98) at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84) at org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) at org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129) at org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} I thought the fix should be assign primary region firstly when enable region replica featue. Will check the implmenation of region replica. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24907) Turn off the balancer when test region admin api
Guanghao Zhang created HBASE-24907: -- Summary: Turn off the balancer when test region admin api Key: HBASE-24907 URL: https://issues.apache.org/jira/browse/HBASE-24907 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang For region admin api, we will test move/split/merge/assign/unassign and test the region location right or not. But the balancer may move region to other places and break the UT. So turn off the balancer for TestAsyncRegionAdminApi. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24904) Split TestAsyncTableAdminApi and TestSnapshotTemporaryDirectoryWithRegionReplicas
Guanghao Zhang created HBASE-24904: -- Summary: Split TestAsyncTableAdminApi and TestSnapshotTemporaryDirectoryWithRegionReplicas Key: HBASE-24904 URL: https://issues.apache.org/jira/browse/HBASE-24904 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang See [https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/42/testReport/org.apache.hadoop.hbase.client/TestAsyncTableAdminApi/] [https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/61/testReport/junit/org.apache.hadoop.hbase.client/TestSnapshotTemporaryDirectoryWithRegionReplicas//] These ut are flaky because they take too much time which more than 780 seconds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24906) Enalrge the wait time in TestReplicationEndpoint#testInterClusterReplication
Guanghao Zhang created HBASE-24906: -- Summary: Enalrge the wait time in TestReplicationEndpoint#testInterClusterReplication Key: HBASE-24906 URL: https://issues.apache.org/jira/browse/HBASE-24906 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Failed many times. But the failed reason are different. The replicated entries number are different. So it means the replication is work and it need more time to replicate all 2500 entries. h3. Error Message Waiting timed out after [30,000] msec Failed to replicate all edits, expected = 2500 replicated = 2499 h3. Error Message Waiting timed out after [30,000] msec Failed to replicate all edits, expected = 2500 replicated = 2481 h3. Error Message Waiting timed out after [30,000] msec Failed to replicate all edits, expected = 2500 replicated = 2491 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24895) Speed up TestFromClientSide3 by reduce the table regions number
[ https://issues.apache.org/jira/browse/HBASE-24895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24895. Fix Version/s: 2.2.6 Resolution: Fixed > Speed up TestFromClientSide3 by reduce the table regions number > --- > > Key: HBASE-24895 > URL: https://issues.apache.org/jira/browse/HBASE-24895 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > [https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/52/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3//] > > |[testHTableExistsMethodMultipleRegionsMultipleGets|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/52/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3//testHTableExistsMethodMultipleRegionsMultipleGets]|2 > min 58 sec|Regression| > |[testHTableExistsMethodMultipleRegionsSingleGet|https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/52/testReport/junit/org.apache.hadoop.hbase.client/TestFromClientSide3//testHTableExistsMethodMultipleRegionsSingleGet]|4 > min 20 sec|Passed| > > It take too many time and timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24912) Enalrge MemstoreFlusherChore/CompactionChecker period for unit test
Guanghao Zhang created HBASE-24912: -- Summary: Enalrge MemstoreFlusherChore/CompactionChecker period for unit test Key: HBASE-24912 URL: https://issues.apache.org/jira/browse/HBASE-24912 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Too many debug logs when run unit test now. 2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. 2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. 2020-08-19 01:21:00,001 DEBUG [regionserver/asf909:0.Chore.1] hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24907) Turn off the balancer when test region admin api
[ https://issues.apache.org/jira/browse/HBASE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24907. Fix Version/s: 2.2.6 Resolution: Fixed > Turn off the balancer when test region admin api > > > Key: HBASE-24907 > URL: https://issues.apache.org/jira/browse/HBASE-24907 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > For region admin api, we will test move/split/merge/assign/unassign and test > the region location right or not. But the balancer may move region to other > places and break the UT. So turn off the balancer for TestAsyncRegionAdminApi. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24912) Enlarge MemstoreFlusherChore/CompactionChecker period for unit test
[ https://issues.apache.org/jira/browse/HBASE-24912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24912. Fix Version/s: 2.3.2 2.2.6 2.4.0 3.0.0-alpha-1 Resolution: Fixed Pushed to branch-2.2+. Thanks [~stack] for reviewing. > Enlarge MemstoreFlusherChore/CompactionChecker period for unit test > --- > > Key: HBASE-24912 > URL: https://issues.apache.org/jira/browse/HBASE-24912 > Project: HBase > Issue Type: Improvement > Reporter: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.2 > > > Too many debug logs when run unit test now. > > 2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. > 2020-08-19 01:20:59,899 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. > 2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. > 2020-08-19 01:20:59,900 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. > 2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): MemstoreFlusherChore execution time: 0 ms. > 2020-08-19 01:20:59,905 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. > 2020-08-19 01:21:00,001 DEBUG [regionserver/asf909:0.Chore.1] > hbase.ScheduledChore(192): CompactionChecker execution time: 0 ms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24689) Generate CHANGES.md and RELEASENOTES.md for 2.2.6
[ https://issues.apache.org/jira/browse/HBASE-24689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24689. Resolution: Fixed > Generate CHANGES.md and RELEASENOTES.md for 2.2.6 > - > > Key: HBASE-24689 > URL: https://issues.apache.org/jira/browse/HBASE-24689 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24906) Enlarge the wait time in TestReplicationEndpoint/TestMetaWithReplicasBasic
[ https://issues.apache.org/jira/browse/HBASE-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24906. Fix Version/s: 2.2.6 Resolution: Fixed > Enlarge the wait time in TestReplicationEndpoint/TestMetaWithReplicasBasic > -- > > Key: HBASE-24906 > URL: https://issues.apache.org/jira/browse/HBASE-24906 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > Failed many times. But the failed reason are different. The replicated > entries number are different. So it means the replication is work and it need > more time to replicate all 2500 entries. > h3. Error Message > Waiting timed out after [30,000] msec Failed to replicate all edits, expected > = 2500 replicated = 2499 > > h3. Error Message > Waiting timed out after [30,000] msec Failed to replicate all edits, expected > = 2500 replicated = 2481 > > h3. Error Message > Waiting timed out after [30,000] msec Failed to replicate all edits, expected > = 2500 replicated = 2491 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24904) Speed up some unit tests
[ https://issues.apache.org/jira/browse/HBASE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24904. Fix Version/s: 2.2.6 Resolution: Fixed > Speed up some unit tests > > > Key: HBASE-24904 > URL: https://issues.apache.org/jira/browse/HBASE-24904 > Project: HBase > Issue Type: Sub-task > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Major > Fix For: 2.2.6 > > > See > [https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/42/testReport/org.apache.hadoop.hbase.client/TestAsyncTableAdminApi/] > [https://ci-hadoop.apache.org/job/HBase/job/HBase-Flaky-Tests/job/branch-2.2/61/testReport/junit/org.apache.hadoop.hbase.client/TestSnapshotTemporaryDirectoryWithRegionReplicas//] > > These ut are flaky because they take too much time which more than 780 > seconds. > > Split TestAsyncTableAdminApi/TestAdminShell/TestLoadIncrementalHFiles > > Reduce region numbers in > TestSnapshotTemporaryDirectoryWithRegionReplicas/TestRegionReplicaFailover/TestSCP* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24052) Add debug+fix to TestMasterShutdown
[ https://issues.apache.org/jira/browse/HBASE-24052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-24052. Fix Version/s: 2.2.6 Resolution: Fixed Pushed to branch-2.2. > Add debug+fix to TestMasterShutdown > --- > > Key: HBASE-24052 > URL: https://issues.apache.org/jira/browse/HBASE-24052 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Trivial > Fix For: 3.0.0-alpha-1, 2.2.6, 2.3.0 > > Attachments: > 0001-HBASE-24052-Add-debug-to-TestMasterShutdown.addendum.patch, > 0001-HBASE-24052-Add-debug-to-TestMasterShutdown.addendum2.patch, > 0001-HBASE-24052-Add-debug-to-TestMasterShutdown.patch > > > Temporarily add debug to TestMasterShutdown overnight to learn more about a > test failure not reproducible locally. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24052) Add debug+fix to TestMasterShutdown
[ https://issues.apache.org/jira/browse/HBASE-24052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-24052: Reopen for cherry-pick to branch-2.2. > Add debug+fix to TestMasterShutdown > --- > > Key: HBASE-24052 > URL: https://issues.apache.org/jira/browse/HBASE-24052 > Project: HBase > Issue Type: Sub-task >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Trivial > Fix For: 3.0.0-alpha-1, 2.3.0 > > Attachments: > 0001-HBASE-24052-Add-debug-to-TestMasterShutdown.addendum.patch, > 0001-HBASE-24052-Add-debug-to-TestMasterShutdown.addendum2.patch, > 0001-HBASE-24052-Add-debug-to-TestMasterShutdown.patch > > > Temporarily add debug to TestMasterShutdown overnight to learn more about a > test failure not reproducible locally. -- This message was sent by Atlassian Jira (v8.3.4#803005)