[jira] [Created] (HBASE-21511) Remove in progress snapshot check in SnapshotFileCache#getUnreferencedFiles
Ted Yu created HBASE-21511: -- Summary: Remove in progress snapshot check in SnapshotFileCache#getUnreferencedFiles Key: HBASE-21511 URL: https://issues.apache.org/jira/browse/HBASE-21511 Project: HBase Issue Type: Improvement Reporter: Ted Yu Attachments: 21511.v1.txt During review of HBASE-21387, [~Apache9] mentioned that the check for in progress snapshots in SnapshotFileCache#getUnreferencedFiles is no longer needed now that snapshot hfile cleaner and taking snapshot are mutually exclusive. This issue is to address the review comment by removing the check for in progress snapshots. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21387) Race condition surrounding in progress snapshot handling in snapshot cache leads to loss of snapshot files
[ https://issues.apache.org/jira/browse/HBASE-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21387: > Race condition surrounding in progress snapshot handling in snapshot cache > leads to loss of snapshot files > -- > > Key: HBASE-21387 > URL: https://issues.apache.org/jira/browse/HBASE-21387 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Labels: snapshot > Fix For: 3.0.0, 1.5.0, 1.3.3, 2.2.0, 2.0.3, 1.4.9, 2.1.2, 1.2.10 > > Attachments: 0001-UT.patch, 21387-suggest.txt, 21387.dbg.txt, > 21387.v10.txt, 21387.v11.txt, 21387.v12.txt, 21387.v2.txt, 21387.v3.txt, > 21387.v6.txt, 21387.v7.txt, 21387.v8.txt, 21387.v9.txt, > HBASE-21387.branch-1.2.patch, HBASE-21387.branch-1.3.patch, > HBASE-21387.branch-1.patch, HBASE-21387.v13.patch, HBASE-21387.v14.patch, > HBASE-21387.v15.patch, HBASE-21387.v16.patch, HBASE-21387.v17.patch, > two-pass-cleaner.v4.txt, two-pass-cleaner.v6.txt, two-pass-cleaner.v9.txt > > > During recent report from customer where ExportSnapshot failed: > {code} > 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] > snapshot.SnapshotReferenceUtil: Can't find hfile: > 44f6c3c646e84de6a63fe30da4fcb3aa in the real > (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa) > or archive > (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa) > directory for the primary table. > {code} > We found the following in log: > {code} > 2018-10-09 18:54:23,675 DEBUG > [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] > cleaner.HFileCleaner: Removing: > hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa > from archive > {code} > The root cause is race condition surrounding in progress snapshot(s) handling > between refreshCache() and getUnreferencedFiles(). > There are two callers of refreshCache: one from RefreshCacheTask#run and the > other from SnapshotHFileCleaner. > Let's look at the code of refreshCache: > {code} > if (!name.equals(SnapshotDescriptionUtils.SNAPSHOT_TMP_DIR_NAME)) { > {code} > whose intention is to exclude in progress snapshot(s). > Suppose when the RefreshCacheTask runs refreshCache, there is some in > progress snapshot (about to finish). > When SnapshotHFileCleaner calls getUnreferencedFiles(), it sees that > lastModifiedTime is up to date. So cleaner proceeds to check in progress > snapshot(s). However, the snapshot has completed by that time, resulting in > some file(s) deemed unreferenced. > Here is timeline given by Josh illustrating the scenario: > At time T0, we are checking if F1 is referenced. At time T1, there is a > snapshot S1 in progress that is referencing a file F1. refreshCache() is > called, but no completed snapshot references F1. At T2, the snapshot S1, > which references F1, completes. At T3, we check in-progress snapshots and S1 > is not included. Thus, F1 is marked as unreferenced even though S1 references > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21482) TestHRegion fails due to 'Too many open files'
Ted Yu created HBASE-21482: -- Summary: TestHRegion fails due to 'Too many open files' Key: HBASE-21482 URL: https://issues.apache.org/jira/browse/HBASE-21482 Project: HBase Issue Type: Bug Reporter: Ted Yu TestHRegion fails due to 'Too many open files' in master branch. Here is one failed subtest : {code} testCheckAndDelete_ThatDeleteWasWritten(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 2.373 sec <<< ERROR! java.lang.IllegalStateException: failed to create a child event loop at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853) at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844) at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835) at org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034) Caused by: org.apache.hbase.thirdparty.io.netty.channel.ChannelException: failed to open a new selector at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853) at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844) at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835) at org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034) Caused by: java.io.IOException: Too many open files at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4853) at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4844) at org.apache.hadoop.hbase.regionserver.TestHRegion.initHRegion(TestHRegion.java:4835) at org.apache.hadoop.hbase.regionserver.TestHRegion.testCheckAndDelete_ThatDeleteWasWritten(TestHRegion.java:2034) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21479) TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException
Ted Yu created HBASE-21479: -- Summary: TestHRegionReplayEvents#testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent fails with IndexOutOfBoundsException Key: HBASE-21479 URL: https://issues.apache.org/jira/browse/HBASE-21479 Project: HBase Issue Type: Bug Reporter: Ted Yu The test fails in both master branch and branch-2 : {code} testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents) Time elapsed: 3.74 sec <<< ERROR! java.lang.IndexOutOfBoundsException: Index: 2, Size: 1 at org.apache.hadoop.hbase.regionserver.TestHRegionReplayEvents.testSkippingEditsWithSmallerSeqIdAfterRegionOpenEvent(TestHRegionReplayEvents.java:1042) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21466) WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir
Ted Yu created HBASE-21466: -- Summary: WALProcedureStore uses wrong FileSystem if wal.dir is on different FileSystem as rootdir Key: HBASE-21466 URL: https://issues.apache.org/jira/browse/HBASE-21466 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu In WALProcedureStore ctor , the fs field is initialized this way: {code} this.fs = walDir.getFileSystem(conf); {code} However, when wal.dir is on different FileSystem as rootdir, the above would return wrong FileSystem. In the modified TestMasterProcedureEvents, without fix, the master wouldn't initialize. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21457) BackupUtils#getWALFilesOlderThan refers to wrong FileSystem
Ted Yu created HBASE-21457: -- Summary: BackupUtils#getWALFilesOlderThan refers to wrong FileSystem Key: HBASE-21457 URL: https://issues.apache.org/jira/browse/HBASE-21457 Project: HBase Issue Type: Bug Reporter: Janos Gub Janos reported seeing backup test failure when testing a local HDFS for WALs while using WASB/ADLS only for store files. Janos spotted the code in BackupUtils#getWALFilesOlderThan which uses HBase root dir for retrieving WAL files. We should use the helper methods from CommonFSUtils. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21247) Custom Meta WAL Provider doesn't default to custom WAL Provider whose configuration value is outside the enums in Providers
[ https://issues.apache.org/jira/browse/HBASE-21247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21247: > Custom Meta WAL Provider doesn't default to custom WAL Provider whose > configuration value is outside the enums in Providers > --- > > Key: HBASE-21247 > URL: https://issues.apache.org/jira/browse/HBASE-21247 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.2 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 3.0.0 > > Attachments: 21247.branch-2.patch, 21247.v1.txt, 21247.v10.txt, > 21247.v11.txt, 21247.v2.txt, 21247.v3.txt, 21247.v4.tst, 21247.v4.txt, > 21247.v5.txt, 21247.v6.txt, 21247.v7.txt, 21247.v8.txt, 21247.v9.txt > > > Currently all the WAL Providers acceptable to hbase are specified in > Providers enum of WALFactory. > This restricts the ability for custom Meta WAL Provider to default to the > custom WAL Provider which is supplied by class name. > This issue fixes the bug by allowing the specification of new WAL Provider > class name using the config "hbase.wal.provider". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21438) TestAdmin2#testGetProcedures fails due to FailedProcedure inaccessible
Ted Yu created HBASE-21438: -- Summary: TestAdmin2#testGetProcedures fails due to FailedProcedure inaccessible Key: HBASE-21438 URL: https://issues.apache.org/jira/browse/HBASE-21438 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu >From >https://builds.apache.org/job/HBase-Flaky-Tests/job/master/1863/testReport/org.apache.hadoop.hbase.client/TestAdmin2/testGetProcedures/ > : {code} Mon Nov 05 04:52:13 UTC 2018, RpcRetryingCaller{globalStartTime=1541393533029, pause=250, maxAttempts=7}, org.apache.hadoop.hbase.procedure2.BadProcedureException: org.apache.hadoop.hbase.procedure2.BadProcedureException: The procedure class org.apache.hadoop.hbase.procedure2.FailedProcedure must be accessible and have an empty constructor at org.apache.hadoop.hbase.procedure2.ProcedureUtil.validateClass(ProcedureUtil.java:82) at org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProtoProcedure(ProcedureUtil.java:162) at org.apache.hadoop.hbase.master.MasterRpcServices.getProcedures(MasterRpcServices.java:1249) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21416) Intermittent TestRegionInfoDisplay failure due to shift in relTime of RegionState#toDescriptiveString
Ted Yu created HBASE-21416: -- Summary: Intermittent TestRegionInfoDisplay failure due to shift in relTime of RegionState#toDescriptiveString Key: HBASE-21416 URL: https://issues.apache.org/jira/browse/HBASE-21416 Project: HBase Issue Type: Test Reporter: Ted Yu Over https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2.1/1799/testReport/junit/org.apache.hadoop.hbase.client/TestRegionInfoDisplay/testRegionDetailsForDisplay/ : {code} org.junit.ComparisonFailure: expected:<...:30 UTC 2018 (PT0.00[6]S ago), server=null> but was:<...:30 UTC 2018 (PT0.00[7]S ago), server=null> at org.apache.hadoop.hbase.client.TestRegionInfoDisplay.testRegionDetailsForDisplay(TestRegionInfoDisplay.java:78) {code} Here is how toDescriptiveString composes relTime: {code} long relTime = System.currentTimeMillis() - stamp; {code} In the test, state.toDescriptiveString() is called twice for the assertion where different return values from System.currentTimeMillis() caused the assertion to fail in the above occasion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21180) findbugs incurs DataflowAnalysisException for hbase-server module
[ https://issues.apache.org/jira/browse/HBASE-21180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-21180. Resolution: Cannot Reproduce > findbugs incurs DataflowAnalysisException for hbase-server module > - > > Key: HBASE-21180 > URL: https://issues.apache.org/jira/browse/HBASE-21180 > Project: HBase > Issue Type: Task >Reporter: Ted Yu >Priority: Minor > > Running findbugs, I noticed the following in hbase-server module: > {code} > [INFO] --- findbugs-maven-plugin:3.0.4:findbugs (default-cli) @ hbase-server > --- > [INFO] Fork Value is true > [java] The following errors occurred during analysis: > [java] Error generating derefs for > org.apache.hadoop.hbase.generated.master.table_jsp._jspService(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V > [java] edu.umd.cs.findbugs.ba.DataflowAnalysisException: can't get > position -1 of stack > [java] At > edu.umd.cs.findbugs.ba.Frame.getStackValue(Frame.java:250) > [java] At > edu.umd.cs.findbugs.ba.Hierarchy.resolveMethodCallTargets(Hierarchy.java:743) > [java] At > edu.umd.cs.findbugs.ba.npe.DerefFinder.getAnalysis(DerefFinder.java:141) > [java] At > edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:50) > [java] At > edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:31) > [java] At > edu.umd.cs.findbugs.classfile.impl.AnalysisCache.analyzeMethod(AnalysisCache.java:369) > [java] At > edu.umd.cs.findbugs.classfile.impl.AnalysisCache.getMethodAnalysis(AnalysisCache.java:322) > [java] At > edu.umd.cs.findbugs.ba.ClassContext.getMethodAnalysis(ClassContext.java:1005) > [java] At > edu.umd.cs.findbugs.ba.ClassContext.getUsagesRequiringNonNullValues(ClassContext.java:325) > [java] At > edu.umd.cs.findbugs.detect.FindNullDeref.foundGuaranteedNullDeref(FindNullDeref.java:1510) > [java] At > edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.reportBugs(NullDerefAndRedundantComparisonFinder.java:361) > [java] At > edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.examineNullValues(NullDerefAndRedundantComparisonFinder.java:266) > [java] At > edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.execute(NullDerefAndRedundantComparisonFinder.java:164) > [java] At > edu.umd.cs.findbugs.detect.FindNullDeref.analyzeMethod(FindNullDeref.java:278) > [java] At > edu.umd.cs.findbugs.detect.FindNullDeref.visitClassContext(FindNullDeref.java:209) > [java] At > edu.umd.cs.findbugs.DetectorToDetector2Adapter.visitClass(DetectorToDetector2Adapter.java:76) > [java] At > edu.umd.cs.findbugs.FindBugs2.analyzeApplication(FindBugs2.java:1089) > [java] At edu.umd.cs.findbugs.FindBugs2.execute(FindBugs2.java:283) > [java] At edu.umd.cs.findbugs.FindBugs.runMain(FindBugs.java:393) > [java] At edu.umd.cs.findbugs.FindBugs2.main(FindBugs2.java:1200) > [java] The following classes needed for analysis were missing: > [java] accept > [java] apply > [java] run > [java] test > [java] call > [java] exec > [java] getAsInt > [java] applyAsLong > [java] storeFile > [java] get > [java] visit > [java] compare > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21387) Race condition in snapshot cache refreshing leads to loss of snapshot files
Ted Yu created HBASE-21387: -- Summary: Race condition in snapshot cache refreshing leads to loss of snapshot files Key: HBASE-21387 URL: https://issues.apache.org/jira/browse/HBASE-21387 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu During recent report from customer where ExportSnapshot failed: {code} 2018-10-09 18:54:32,559 ERROR [VerifySnapshot-pool1-t2] snapshot.SnapshotReferenceUtil: Can't find hfile: 44f6c3c646e84de6a63fe30da4fcb3aa in the real (hdfs://in.com:8020/apps/hbase/data/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa) or archive (hdfs://in.com:8020/apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa) directory for the primary table. {code} We found the following in log: {code} 2018-10-09 18:54:23,675 DEBUG [00:16000.activeMasterManager-HFileCleaner.large-1539035367427] cleaner.HFileCleaner: Removing: hdfs:///apps/hbase/data/archive/data/.../a/44f6c3c646e84de6a63fe30da4fcb3aa from archive {code} The root cause is race condition surrounding SnapshotFileCache#refreshCache(). There are two callers of refreshCache: one from RefreshCacheTask#run and the other from SnapshotHFileCleaner. Let's look at the code of refreshCache: {code} // if the snapshot directory wasn't modified since we last check, we are done if (dirStatus.getModificationTime() <= this.lastModifiedTime) return; // 1. update the modified time this.lastModifiedTime = dirStatus.getModificationTime(); // 2.clear the cache this.cache.clear(); {code} Suppose the RefreshCacheTask runs past the if check and sets this.lastModifiedTime The cleaner executes refreshCache and returns immediately since this.lastModifiedTime matches the modification time of the directory. Now RefreshCacheTask clears the cache. By the time the cleaner performs cache lookup, the cache is empty. Therefore cleaner puts the file into unReferencedFiles - leading to data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21318) Make RefreshHFilesClient runnable
[ https://issues.apache.org/jira/browse/HBASE-21318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21318: > Make RefreshHFilesClient runnable > - > > Key: HBASE-21318 > URL: https://issues.apache.org/jira/browse/HBASE-21318 > Project: HBase > Issue Type: Improvement > Components: HFile >Affects Versions: 3.0.0, 1.5.0, 2.1.2 >Reporter: Tak Lon (Stephen) Wu >Assignee: Tak Lon (Stephen) Wu >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-21318.master.001.patch, > HBASE-21318.master.002.patch, HBASE-21318.master.003.patch, > HBASE-21318.master.004.patch > > > Other than when user enables hbase.coprocessor.region.classes with > RefreshHFilesEndPoint, user can also run this client as tool runner class/CLI > and calls refresh HFiles directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21149) TestIncrementalBackupWithBulkLoad may fail due to file copy failure
[ https://issues.apache.org/jira/browse/HBASE-21149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-21149. Resolution: Duplicate Fix Version/s: (was: 3.0.0) > TestIncrementalBackupWithBulkLoad may fail due to file copy failure > --- > > Key: HBASE-21149 > URL: https://issues.apache.org/jira/browse/HBASE-21149 > Project: HBase > Issue Type: Test > Components: backuprestore >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Critical > Attachments: 21149.v2.txt, HBASE-21149-v1.patch, > testIncrementalBackupWithBulkLoad-output.txt > > > From > https://builds.apache.org/job/HBase%20Nightly/job/master/471/testReport/junit/org.apache.hadoop.hbase.backup/TestIncrementalBackupWithBulkLoad/TestIncBackupDeleteTable/ > : > {code} > 2018-09-03 11:54:30,526 ERROR [Time-limited test] > impl.TableBackupClient(235): Unexpected Exception : Failed copy from > hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_ > to hdfs://localhost:53075/backupUT/backup_1535975655488 > java.io.IOException: Failed copy from > hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_ > to hdfs://localhost:53075/backupUT/backup_1535975655488 > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:351) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.copyBulkLoadedFiles(IncrementalTableBackupClient.java:219) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.handleBulkLoad(IncrementalTableBackupClient.java:198) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:320) > at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605) > at > org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable(TestIncrementalBackupWithBulkLoad.java:104) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > {code} > However, some part of the test output was lost: > {code} > 2018-09-03 11:53:36,793 DEBUG [RS:0;765c9ca5ea28:36357] regions > ...[truncated 398396 chars]... > 8) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21381) Document the hadoop versions using which backup and restore feature works
Ted Yu created HBASE-21381: -- Summary: Document the hadoop versions using which backup and restore feature works Key: HBASE-21381 URL: https://issues.apache.org/jira/browse/HBASE-21381 Project: HBase Issue Type: Task Reporter: Ted Yu HADOOP-15850 fixes a bug where CopyCommitter#concatFileChunks unconditionally tried to concatenate the files being DistCp'ed to target cluster (though the files are independent). Following is the log snippet of the failed concatenation attempt: {code} 2018-10-13 14:09:25,351 WARN [Thread-936] mapred.LocalJobRunner$Job(590): job_local1795473782_0004 java.io.IOException: Inconsistent sequence file: current chunk file org.apache.hadoop.tools.CopyListingFileStatus@bb8826ee{hdfs://localhost:42796/user/hbase/test-data/ 160aeab5-6bca-9f87-465e-2517a0c43119/data/default/test-1539439707496/96b5a3613d52f4df1ba87a1cef20684c/f/a7599081e835440eb7bf0dd3ef4fd7a5_SeqId_205_ length = 5100 aclEntries = null, xAttrs = null} doesnt match prior entry org.apache.hadoop.tools.CopyListingFileStatus@243d544d{hdfs://localhost:42796/user/hbase/test-data/160aeab5-6bca-9f87-465e- 2517a0c43119/data/default/test-1539439707496/96b5a3613d52f4df1ba87a1cef20684c/f/394e6d39a9b94b148b9089c4fb967aad_SeqId_205_ length = 5142 aclEntries = null, xAttrs = null} at org.apache.hadoop.tools.mapred.CopyCommitter.concatFileChunks(CopyCommitter.java:276) at org.apache.hadoop.tools.mapred.CopyCommitter.commitJob(CopyCommitter.java:100) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:567) {code} Backup and Restore uses DistCp to transfer files between clusters. Without the fix from HADOOP-15850, the transfer would fail. This issue is to document the hadoop versions which contain HADOOP-15850 so that user of Backup and Restore feature knows which hadoop versions they can use. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21353) TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport
Ted Yu created HBASE-21353: -- Summary: TestHBCKCommandLineParsing#testCommandWithOptions hangs on call to HBCK2#checkHBCKSupport Key: HBASE-21353 URL: https://issues.apache.org/jira/browse/HBASE-21353 Project: HBase Issue Type: Test Reporter: Ted Yu I noticed the following when running TestHBCKCommandLineParsing#testCommandWithOptions : {code} "main" #1 prio=5 os_prio=31 tid=0x7f851c80 nid=0x1703 waiting on condition [0x70216000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00076d3055d8> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:564) at org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:297) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$0(ConnectionFactory.java:229) at org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$11/502838712.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:347) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:227) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:127) at org.apache.hbase.HBCK2.checkHBCKSupport(HBCK2.java:93) at org.apache.hbase.HBCK2.run(HBCK2.java:352) at org.apache.hbase.TestHBCKCommandLineParsing.testCommandWithOptions(TestHBCKCommandLineParsing.java:62) {code} The test doesn't spin up hbase cluster. Hence the call to check hbck support hangs. In HBCK2#run, we can refactor the code such that argument parsing is done prior to calling HBCK2#checkHBCKSupport . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21281) Update bouncycastle dependency.
[ https://issues.apache.org/jira/browse/HBASE-21281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21281: > Update bouncycastle dependency. > --- > > Key: HBASE-21281 > URL: https://issues.apache.org/jira/browse/HBASE-21281 > Project: HBase > Issue Type: Task > Components: dependencies, test >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > Fix For: 3.0.0, 2.2.0 > > Attachments: 21281.addendum.patch, 21281.addendum2.patch, > HBASE-21281.001.branch-2.0.patch > > > Looks like we still depend on bcprov-jdk16 for some x509 certificate > generation in our tests. Bouncycastle has moved beyond this in 1.47, changing > the artifact names. > [http://www.bouncycastle.org/wiki/display/JA1/Porting+from+earlier+BC+releases+to+1.47+and+later] > There are some API changes too, but it looks like we don't use any of these. > It seems like we also have vestiges in the POMs from when we were depending > on a specific BC version that came in from Hadoop. We now have a > KeyStoreTestUtil class in HBase, which makes me think we can also clean up > some dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21341) DeadServer shouldn't import unshaded Preconditions
Ted Yu created HBASE-21341: -- Summary: DeadServer shouldn't import unshaded Preconditions Key: HBASE-21341 URL: https://issues.apache.org/jira/browse/HBASE-21341 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu DeadServer currently imports unshaded Preconditions : {code} import com.google.common.base.Preconditions; {code} We should import shaded version of Preconditions. This is the only place where unshaded class from com.google.common is imported -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21279) Split TestAdminShell into several tests
Ted Yu created HBASE-21279: -- Summary: Split TestAdminShell into several tests Key: HBASE-21279 URL: https://issues.apache.org/jira/browse/HBASE-21279 Project: HBase Issue Type: Test Reporter: Ted Yu In the flaky test board, TestAdminShell often timed out (https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/branch-2/lastSuccessfulBuild/artifact/dashboard.html). I ran the test on Linux with SSD and reproduced the timeout (see attached test output). {code} 2018-10-08 02:36:09,146 DEBUG [main] hbase.HBaseTestingUtility(351): Setting hbase.rootdir to /mnt/disk2/a/2-hbase/hbase-shell/target/test-data/a103d8e4-695c-a5a9-6690-1ef2580050f9 ... 2018-10-08 02:49:09,093 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=7] master.MasterRpcServices(1171): Checking to see if procedure is done pid=871 Took 0.7262 seconds2018-10-08 02:49:09,324 DEBUG [PEWorker-1] util.FSTableDescriptors(684): Wrote into hdfs://localhost:43859/user/hbase/test-data/cefc73d9-cc37-d2a6-b92b- d935316c9241/.tmp/data/default/hbase_shell_tests_table/.tabledesc/.tableinfo.01 2018-10-08 02:49:09,328 INFO [RegionOpenAndInitThread-hbase_shell_tests_table-1] regionserver.HRegion(7004): creating HRegion hbase_shell_tests_table HTD == 'hbase_shell_tests_table', {NAME => 'x', VERSIONS => '5', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}, {NAME => 'y', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} RootDir = hdfs://localhost:43859/ user/hbase/test-data/cefc73d9-cc37-d2a6-b92b-d935316c9241/.tmp Table name == hbase_shell_tests_table ^[[38;5;226mE^[[0m === Error: ^[[48;5;16;38;5;226;1mtest_Get_simple_status(Hbase::StatusTest)^[[0m: Java::JavaIo::InterruptedIOException: Interrupt while waiting on Operation: CREATE, Table Name: default:hbase_shell_tests_table, procId: 871 2018-10-08 02:49:09,361 INFO [Block report processor] blockmanagement.BlockManager(2645): BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:41338 is added to blk_1073742193_1369{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-ecc89143-e0a5-4a1c-b552-120be2561334:NORMAL:127.0.0.1: 41338|RBW]]} size 58 > TEST TIMED OUT. PRINTING THREAD DUMP. < {code} We can see that the procedure #871 wasn't stuck - the timeout cut in and stopped the test. We should separate the current test into two (or more) test files (with corresponding .rb) so that the execution time consistently would not exceed limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21272) Re-add assertions for RS Group admin tests
Ted Yu created HBASE-21272: -- Summary: Re-add assertions for RS Group admin tests Key: HBASE-21272 URL: https://issues.apache.org/jira/browse/HBASE-21272 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Fix For: 1.5.0 The checked in version of HBASE-21258 for branch-1 didn't include assertions for adding / removing RS group coprocessor hook calls. This issue is to add the assertions to corresponding tests in TestRSGroupsAdmin1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations
[ https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21221: > Ineffective assertion in TestFromClientSide3#testMultiRowMutations > -- > > Key: HBASE-21221 > URL: https://issues.apache.org/jira/browse/HBASE-21221 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Fix For: 3.0.0 > > Attachments: 21221.addendum.txt, 21221.v10.txt, 21221.v11.txt, > 21221.v12.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt > > > Observed the following in > org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt : > {code} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): > java.io.IOException: Timed out waiting for lock for row: ROW-1 in region > 089bdfa75f44d88e596479038a6da18b > at > org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816) > at > org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982) > at > org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424) > at > org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116) > at > org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266) > at > org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463) > ... > Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp > should fail because the target lock is blocked by previous put > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {code} > Here is related code: > {code} > cpService.execute(() -> { > ... > if (!threw) { > // Can't call fail() earlier because the catch would eat it. > fail("This cp should fail because the target lock is blocked by > previous put"); > } > {code} > Since the fail() call is executed by the cpService, the assertion had no > bearing on the outcome of the test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21261) Add log4j.properties for hbase-rsgroup tests
Ted Yu created HBASE-21261: -- Summary: Add log4j.properties for hbase-rsgroup tests Key: HBASE-21261 URL: https://issues.apache.org/jira/browse/HBASE-21261 Project: HBase Issue Type: Test Reporter: Ted Yu When I tried to debug TestRSGroups, at first I couldn't find any DEBUG log. Turns out that under hbase-rsgroup/src/test/resources there is no log4j.properties This issue adds log4j.properties for hbase-rsgroup tests. This would be useful when finding root cause for hbase-rsgroup test failure(s). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.
[ https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21207: > Add client side sorting functionality in master web UI for table and region > server details. > --- > > Key: HBASE-21207 > URL: https://issues.apache.org/jira/browse/HBASE-21207 > Project: HBase > Issue Type: Improvement > Components: master, monitoring, UI, Usability >Reporter: Archana Katiyar >Assignee: Archana Katiyar >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8 > > Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, > 21207.branch-1.addendum.patch, 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, > HBASE-21207-branch-1.patch, HBASE-21207-branch-1.v1.patch, > HBASE-21207-branch-2.v1.patch, HBASE-21207.patch, HBASE-21207.patch, > HBASE-21207.v1.patch, edc5c812-b928-11e8-87e2-ce6396629bbc.png > > > In Master UI, we can see region server details like requests per seconds and > number of regions etc. Similarly, for tables also we can see online regions , > offline regions. > It will help ops people in determining hot spotting if we can provide sort > functionality in the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups
Ted Yu created HBASE-21258: -- Summary: Add resetting of flags for RS Group pre/post hooks in TestRSGroups Key: HBASE-21258 URL: https://issues.apache.org/jira/browse/HBASE-21258 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS Group pre/post hooks in TestRSGroups was absent. This issue is to add the resetting of these flags before each subtest starts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21247) Allow WAL Provider to be specified by configuration without explicit enum in Providers
Ted Yu created HBASE-21247: -- Summary: Allow WAL Provider to be specified by configuration without explicit enum in Providers Key: HBASE-21247 URL: https://issues.apache.org/jira/browse/HBASE-21247 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Attachments: 21247.v1.txt Currently all the WAL Providers acceptable to hbase are specified in Providers enum of WALFactory. This restricts the ability for additional WAL Providers to be supplied - by class name. This issue introduces additional config which allows the specification of new WAL Provider through class name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21246) Introduce WALIdentity interface
Ted Yu created HBASE-21246: -- Summary: Introduce WALIdentity interface Key: HBASE-21246 URL: https://issues.apache.org/jira/browse/HBASE-21246 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu We are introducing WALIdentity interface so that the WAL representation can be decoupled from distributed filesystem. The interface provides getName method whose return value can represent filename in distributed filesystem environment or, the name of the stream when the WAL is backed by log stream. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21238) MapReduceHFileSplitterJob#run shouldn't call System.exit
Ted Yu created HBASE-21238: -- Summary: MapReduceHFileSplitterJob#run shouldn't call System.exit Key: HBASE-21238 URL: https://issues.apache.org/jira/browse/HBASE-21238 Project: HBase Issue Type: Bug Reporter: Ted Yu {code} if (args.length < 2) { usage("Wrong number of arguments: " + args.length); System.exit(-1); {code} Correct way of handling error condition is through return value of run method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21230) BackupUtils#checkTargetDir doesn't compose error message correctly
Ted Yu created HBASE-21230: -- Summary: BackupUtils#checkTargetDir doesn't compose error message correctly Key: HBASE-21230 URL: https://issues.apache.org/jira/browse/HBASE-21230 Project: HBase Issue Type: Bug Components: backuprestore Reporter: Ted Yu Here is related code: {code} String expMsg = e.getMessage(); String newMsg = null; if (expMsg.contains("No FileSystem for scheme")) { newMsg = "Unsupported filesystem scheme found in the backup target url. Error Message: " + newMsg; {code} I think the intention was to concatenate expMsg at the end of newMsg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-16627) AssignmentManager#isDisabledorDisablingRegionInRIT should check whether table exists
[ https://issues.apache.org/jira/browse/HBASE-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-16627. Resolution: Later > AssignmentManager#isDisabledorDisablingRegionInRIT should check whether table > exists > > > Key: HBASE-16627 > URL: https://issues.apache.org/jira/browse/HBASE-16627 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Stephen Yuan Jiang >Priority: Minor > > [~stack] first reported this issue when he played with backup feature. > The following exception can be observed in backup unit tests: > {code} > 2016-09-13 16:21:57,661 ERROR [ProcedureExecutor-3] > master.TableStateManager(134): Unable to get table hbase:backup state > org.apache.hadoop.hbase.TableNotFoundException: hbase:backup > at > org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:174) > at > org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:131) > at > org.apache.hadoop.hbase.master.AssignmentManager.isDisabledorDisablingRegionInRIT(AssignmentManager.java:1221) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:739) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1567) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1546) > at > org.apache.hadoop.hbase.util.ModifyRegionUtils.assignRegions(ModifyRegionUtils.java:254) > at > org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.assignRegions(CreateTableProcedure.java:430) > at > org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:127) > at > org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:57) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:452) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1066) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:855) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:808) > {code} > AssignmentManager#isDisabledorDisablingRegionInRIT should take table > existence into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations
Ted Yu created HBASE-21221: -- Summary: Ineffective assertion in TestFromClientSide3#testMultiRowMutations Key: HBASE-21221 URL: https://issues.apache.org/jira/browse/HBASE-21221 Project: HBase Issue Type: Test Reporter: Ted Yu Observed the following in org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt : {code} Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 089bdfa75f44d88e596479038a6da18b at org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816) at org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982) at org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424) at org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116) at org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266) at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481) at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463) ... Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp should fail because the target lock is blocked by previous put at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) {code} Here is related code: {code} cpService.execute(() -> { ... if (!threw) { // Can't call fail() earlier because the catch would eat it. fail("This cp should fail because the target lock is blocked by previous put"); } {code} Since the fail() call is executed by the cpService, the assertion had no bearing on the outcome of the test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21216) TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky
Ted Yu created HBASE-21216: -- Summary: TestSnapshotFromMaster#testSnapshotHFileArchiving is flaky Key: HBASE-21216 URL: https://issues.apache.org/jira/browse/HBASE-21216 Project: HBase Issue Type: Test Reporter: Ted Yu >From >https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2/794/testReport/junit/org.apache.hadoop.hbase.master.cleaner/TestSnapshotFromMaster/testSnapshotHFileArchiving/ > : {code} java.lang.AssertionError: Archived hfiles [] and table hfiles [9ca09392705f425f9c916beedc10d63c] is missing snapshot file:6739a09747e54189a4112a6d8f37e894 at org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster.testSnapshotHFileArchiving(TestSnapshotFromMaster.java:370) {code} The file appeared in archive dir before hfile cleaners were run: {code} 2018-09-20 10:38:53,187 DEBUG [Time-limited test] util.CommonFSUtils(771): |-archive/ 2018-09-20 10:38:53,188 DEBUG [Time-limited test] util.CommonFSUtils(771): |data/ 2018-09-20 10:38:53,189 DEBUG [Time-limited test] util.CommonFSUtils(771): |---default/ 2018-09-20 10:38:53,190 DEBUG [Time-limited test] util.CommonFSUtils(771): |--test/ 2018-09-20 10:38:53,191 DEBUG [Time-limited test] util.CommonFSUtils(771): |-1237d57b63a7bdf067a930441a02514a/ 2018-09-20 10:38:53,192 DEBUG [Time-limited test] util.CommonFSUtils(771): |recovered.edits/ 2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(774): |---4.seqid 2018-09-20 10:38:53,193 DEBUG [Time-limited test] util.CommonFSUtils(771): |-29e1700e09b51223ad2f5811105a4d51/ 2018-09-20 10:38:53,194 DEBUG [Time-limited test] util.CommonFSUtils(771): |fam/ 2018-09-20 10:38:53,195 DEBUG [Time-limited test] util.CommonFSUtils(774): |---2c66a18f6c1a4074b84ffbb3245268c4 2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): |---45bb396c6a5e49629e45a4d56f1e9b14 2018-09-20 10:38:53,196 DEBUG [Time-limited test] util.CommonFSUtils(774): |---6739a09747e54189a4112a6d8f37e894 {code} However, the archive dir became empty after hfile cleaners were run: {code} 2018-09-20 10:38:53,312 DEBUG [Time-limited test] util.CommonFSUtils(771): |-archive/ 2018-09-20 10:38:53,313 DEBUG [Time-limited test] util.CommonFSUtils(771): |-corrupt/ {code} Leading to the assertion failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21198) Exclude dependency on net.minidev:json-smart
Ted Yu created HBASE-21198: -- Summary: Exclude dependency on net.minidev:json-smart Key: HBASE-21198 URL: https://issues.apache.org/jira/browse/HBASE-21198 Project: HBase Issue Type: Task Reporter: Ted Yu >From >https://builds.apache.org/job/PreCommit-HBASE-Build/14414/artifact/patchprocess/patch-javac-3.0.0.txt > : {code} [ERROR] Failed to execute goal on project hbase-common: Could not resolve dependencies for project org.apache.hbase:hbase-common:jar:3.0.0-SNAPSHOT: Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0 -> org.apache.hadoop:hadoop-auth:jar:3.0.0 -> com.nimbusds:nimbus-jose-jwt:jar:4.41.1 -> net.minidev:json-smart:jar:2.3-SNAPSHOT: Failed to read artifact descriptor for net.minidev:json-smart:jar:2.3-SNAPSHOT: Could not transfer artifact net.minidev:json-smart:pom:2.3-SNAPSHOT from/to dynamodb-local-oregon (https://s3-us-west-2.amazonaws.com/dynamodb-local/release): Access denied to: https://s3-us-west-2.amazonaws.com/dynamodb-local/release/net/minidev/json-smart/2.3-SNAPSHOT/json-smart-2.3-SNAPSHOT.pom , ReasonPhrase:Forbidden. -> [Help 1] {code} We should exclude dependency on net.minidev:json-smart hbase-common/bin/pom.xml has done so. The other pom.xml should do the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21194) Add TestCopyTable which exercises MOB feature
Ted Yu created HBASE-21194: -- Summary: Add TestCopyTable which exercises MOB feature Key: HBASE-21194 URL: https://issues.apache.org/jira/browse/HBASE-21194 Project: HBase Issue Type: Test Reporter: Ted Yu Currently TestCopyTable doesn't cover table(s) with MOB feature enabled. We should add variant that enables MOB on the table being copied and verify that MOB content is copied correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21180) findbugs incurs DataflowAnalysisException for hbase-server module
Ted Yu created HBASE-21180: -- Summary: findbugs incurs DataflowAnalysisException for hbase-server module Key: HBASE-21180 URL: https://issues.apache.org/jira/browse/HBASE-21180 Project: HBase Issue Type: Task Reporter: Ted Yu Running findbugs, I noticed the following in hbase-server module: {code} [INFO] --- findbugs-maven-plugin:3.0.4:findbugs (default-cli) @ hbase-server --- [INFO] Fork Value is true [java] The following errors occurred during analysis: [java] Error generating derefs for org.apache.hadoop.hbase.generated.master.table_jsp._jspService(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V [java] edu.umd.cs.findbugs.ba.DataflowAnalysisException: can't get position -1 of stack [java] At edu.umd.cs.findbugs.ba.Frame.getStackValue(Frame.java:250) [java] At edu.umd.cs.findbugs.ba.Hierarchy.resolveMethodCallTargets(Hierarchy.java:743) [java] At edu.umd.cs.findbugs.ba.npe.DerefFinder.getAnalysis(DerefFinder.java:141) [java] At edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:50) [java] At edu.umd.cs.findbugs.classfile.engine.bcel.UsagesRequiringNonNullValuesFactory.analyze(UsagesRequiringNonNullValuesFactory.java:31) [java] At edu.umd.cs.findbugs.classfile.impl.AnalysisCache.analyzeMethod(AnalysisCache.java:369) [java] At edu.umd.cs.findbugs.classfile.impl.AnalysisCache.getMethodAnalysis(AnalysisCache.java:322) [java] At edu.umd.cs.findbugs.ba.ClassContext.getMethodAnalysis(ClassContext.java:1005) [java] At edu.umd.cs.findbugs.ba.ClassContext.getUsagesRequiringNonNullValues(ClassContext.java:325) [java] At edu.umd.cs.findbugs.detect.FindNullDeref.foundGuaranteedNullDeref(FindNullDeref.java:1510) [java] At edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.reportBugs(NullDerefAndRedundantComparisonFinder.java:361) [java] At edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.examineNullValues(NullDerefAndRedundantComparisonFinder.java:266) [java] At edu.umd.cs.findbugs.ba.npe.NullDerefAndRedundantComparisonFinder.execute(NullDerefAndRedundantComparisonFinder.java:164) [java] At edu.umd.cs.findbugs.detect.FindNullDeref.analyzeMethod(FindNullDeref.java:278) [java] At edu.umd.cs.findbugs.detect.FindNullDeref.visitClassContext(FindNullDeref.java:209) [java] At edu.umd.cs.findbugs.DetectorToDetector2Adapter.visitClass(DetectorToDetector2Adapter.java:76) [java] At edu.umd.cs.findbugs.FindBugs2.analyzeApplication(FindBugs2.java:1089) [java] At edu.umd.cs.findbugs.FindBugs2.execute(FindBugs2.java:283) [java] At edu.umd.cs.findbugs.FindBugs.runMain(FindBugs.java:393) [java] At edu.umd.cs.findbugs.FindBugs2.main(FindBugs2.java:1200) [java] The following classes needed for analysis were missing: [java] accept [java] apply [java] run [java] test [java] call [java] exec [java] getAsInt [java] applyAsLong [java] storeFile [java] get [java] visit [java] compare {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21175) Partially initialized SnapshotHFileCleaner leads to NPE during TestHFileArchiving
Ted Yu created HBASE-21175: -- Summary: Partially initialized SnapshotHFileCleaner leads to NPE during TestHFileArchiving Key: HBASE-21175 URL: https://issues.apache.org/jira/browse/HBASE-21175 Project: HBase Issue Type: Test Reporter: Ted Yu TestHFileArchiving#testCleaningRace creates HFileCleaner instance within the test. When SnapshotHFileCleaner.init() is called, there is no master parameter passed in {{params}}. When the chore runs the cleaner during the test, NPE comes out of this line in getDeletableFiles(): {code} return cache.getUnreferencedFiles(files, master.getSnapshotManager()); {code} since master is null. We should either check for the null master or, pass master instance properly when constructing the cleaner instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-21129) Clean up duplicate codes in #equals and #hashCode methods of Filter
[ https://issues.apache.org/jira/browse/HBASE-21129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-21129. Resolution: Fixed > Clean up duplicate codes in #equals and #hashCode methods of Filter > --- > > Key: HBASE-21129 > URL: https://issues.apache.org/jira/browse/HBASE-21129 > Project: HBase > Issue Type: Improvement > Components: Filters >Affects Versions: 3.0.0, 2.2.0 >Reporter: Reid Chan >Assignee: Reid Chan >Priority: Minor > Fix For: 3.0.0, 2.2.0 > > Attachments: 21129.addendum, HBASE-21129.master.001.patch, > HBASE-21129.master.002.patch, HBASE-21129.master.003.patch, > HBASE-21129.master.004.patch, HBASE-21129.master.005.patch, > HBASE-21129.master.006.patch, HBASE-21129.master.007.patch, > HBASE-21129.master.008.patch > > > It is a follow-up of HBASE-19008, aiming to clean up duplicate codes in > #equals and #hashCode methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21160) Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored
Ted Yu created HBASE-21160: -- Summary: Assertion in TestVisibilityLabelsWithDeletes#testDeleteColumnsWithoutAndWithVisibilityLabels is ignored Key: HBASE-21160 URL: https://issues.apache.org/jira/browse/HBASE-21160 Project: HBase Issue Type: Test Reporter: Ted Yu >From >https://builds.apache.org/job/PreCommit-HBASE-Build/14327/artifact/patchprocess/diff-compile-javac-hbase-server.txt > (HBASE-21138 QA run): {code} [WARNING] /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDeletes.java:[315,25] [AssertionFailureIgnored] This assertion throws an AssertionError if it fails, which will be caught by an enclosing try block. {code} Here is related code: {code} PrivilegedExceptionAction scanAction = new PrivilegedExceptionAction() { @Override public Void run() throws Exception { try (Connection connection = ConnectionFactory.createConnection(conf); ... assertEquals(1, next.length); } catch (Throwable t) { throw new IOException(t); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-21150) Avoid delay in first flushes due to overheads in table metrics registration
[ https://issues.apache.org/jira/browse/HBASE-21150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-21150: I didn't open this issue for backporting. HBASE-15728 is still in master and the delay in first flushes is still there. > Avoid delay in first flushes due to overheads in table metrics registration > --- > > Key: HBASE-21150 > URL: https://issues.apache.org/jira/browse/HBASE-21150 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Attachments: 21150.v1.txt, 21150.v2.txt, 21150.v3.txt > > > After HBASE-15728 is integrated, the lazy table metrics registration results > in penalty for the first flushes. > Excerpt from log shows delay (note the same timestamp 08:18:23,234) : > {code:java} > 2018-09-02 08:18:23,232 DEBUG > [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] > regionserver.MetricsTableSourceImpl(124): Creating new > MetricsTableSourceImpl for table 'testtb-1535901500805' > 2018-09-02 08:18:23,233 DEBUG > [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] > regionserver.MetricsTableSourceImpl(137): registering metrics for testtb- > 1535901500805 > 2018-09-02 08:18:23,234 INFO > [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] > regionserver.HRegion(2822): Finished flush of dataSize ~2.29 KB/2343, > heapSize ~5.16 KB/5280, currentSize=0 B/0 for > fa403f6a4fb8dbc1a1c389744fce2d58 in 280ms, sequenceid=5, compaction > requested=false > 2018-09-02 08:18:23,234 DEBUG > [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1] > regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register > testtb-1535901500805 > Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1,5,FailOnTimeoutGroup] > 2018-09-02 08:18:23,234 DEBUG > [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] > regionserver.MetricsTableAggregateSourceImpl(84): it took 0 ms to register > testtb-1535901500805 > Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1,5,FailOnTimeoutGroup] > 2018-09-02 08:18:23,234 DEBUG > [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1] > regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register > testtb-1535901500805 > Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1,5,FailOnTimeoutGroup] > 2018-09-02 08:18:23,234 DEBUG > [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2] > regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register > testtb-1535901500805 > Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2,5,FailOnTimeoutGroup] > 2018-09-02 08:18:23,234 DEBUG > [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2] > regionserver.MetricsTableAggregateSourceImpl(84): it took 5 ms to register > testtb-1535901500805 > Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2,5,FailOnTimeoutGroup] > 2018-09-02 08:18:23,234 DEBUG > [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] > regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register > testtb-1535901500805 > Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2,5,FailOnTimeoutGroup] > {code} > This is a regression. > When first region of the table is opened on region server, we can proactively > register table metrics. > This would avoid the penalty on first flushes for the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21150) Avoid delay in first flushes due to contention in table metrics registration
Ted Yu created HBASE-21150: -- Summary: Avoid delay in first flushes due to contention in table metrics registration Key: HBASE-21150 URL: https://issues.apache.org/jira/browse/HBASE-21150 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu After HBASE-15728 is integrated, the lazy table metrics registration results in penalty for the first flushes. Excerpt from log shows delay (note the same timestamp 08:18:23,234) : {code} 2018-09-02 08:18:23,232 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] regionserver.MetricsTableSourceImpl(124): Creating new MetricsTableSourceImpl for table 'testtb-1535901500805' 2018-09-02 08:18:23,233 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] regionserver.MetricsTableSourceImpl(137): registering metrics for testtb- 1535901500805 2018-09-02 08:18:23,234 INFO [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] regionserver.HRegion(2822): Finished flush of dataSize ~2.29 KB/2343, heapSize ~5.16 KB/5280, currentSize=0 B/0 for fa403f6a4fb8dbc1a1c389744fce2d58 in 280ms, sequenceid=5, compaction requested=false 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-1,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1] regionserver.MetricsTableAggregateSourceImpl(84): it took 0 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-1,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-1,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52762,1535901497314)-snapshot-pool9-thread-2,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2] regionserver.MetricsTableAggregateSourceImpl(84): it took 5 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52758,1535901497238)-snapshot-pool11-thread-2,5,FailOnTimeoutGroup] 2018-09-02 08:18:23,234 DEBUG [rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2] regionserver.MetricsTableAggregateSourceImpl(84): it took 6 ms to register testtb-1535901500805 Thread[rs(hw13463.attlocal.net,52760,1535901497280)-snapshot-pool10-thread-2,5,FailOnTimeoutGroup] {code} This is a regression. When first region of the table is opened on region server, we can proactively register table metrics. This would avoid the penalty on first flushes for the table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21149) TestIncrementalBackupWithBulkLoad may fail due to file copy failure
Ted Yu created HBASE-21149: -- Summary: TestIncrementalBackupWithBulkLoad may fail due to file copy failure Key: HBASE-21149 URL: https://issues.apache.org/jira/browse/HBASE-21149 Project: HBase Issue Type: Test Components: backuprestore Reporter: Ted Yu >From >https://builds.apache.org/job/HBase%20Nightly/job/master/471/testReport/junit/org.apache.hadoop.hbase.backup/TestIncrementalBackupWithBulkLoad/TestIncBackupDeleteTable/ > : {code} 2018-09-03 11:54:30,526 ERROR [Time-limited test] impl.TableBackupClient(235): Unexpected Exception : Failed copy from hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_ to hdfs://localhost:53075/backupUT/backup_1535975655488 java.io.IOException: Failed copy from hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_ to hdfs://localhost:53075/backupUT/backup_1535975655488 at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:351) at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.copyBulkLoadedFiles(IncrementalTableBackupClient.java:219) at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.handleBulkLoad(IncrementalTableBackupClient.java:198) at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:320) at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605) at org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable(TestIncrementalBackupWithBulkLoad.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) {code} However, some part of the test output was lost: {code} 2018-09-03 11:53:36,793 DEBUG [RS:0;765c9ca5ea28:36357] regions ...[truncated 398396 chars]... 8) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21141) Enable MOB in backup / restore test involving incremental backup
Ted Yu created HBASE-21141: -- Summary: Enable MOB in backup / restore test involving incremental backup Key: HBASE-21141 URL: https://issues.apache.org/jira/browse/HBASE-21141 Project: HBase Issue Type: Test Components: backuprestore Reporter: Ted Yu Currently we only have one test (TestRemoteBackup) where MOB feature is enabled. The test only performs full backup. This issue is to enable MOB in backup / restore test(s) involving incremental backup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21139) Concurrent invocations of MetricsTableAggregateSourceImpl.getOrCreateTableSource may return unregistered MetricsTableSource
Ted Yu created HBASE-21139: -- Summary: Concurrent invocations of MetricsTableAggregateSourceImpl.getOrCreateTableSource may return unregistered MetricsTableSource Key: HBASE-21139 URL: https://issues.apache.org/jira/browse/HBASE-21139 Project: HBase Issue Type: Bug Reporter: Ted Yu >From test output of TestRestoreFlushSnapshotFromClient : {code} 2018-09-01 21:09:38,174 WARN [member: 'hw13463.attlocal.net,49623,1535861370108' subprocedure-pool6-thread-1] snapshot. RegionServerSnapshotManager$SnapshotSubprocedurePool(348): Got Exception in SnapshotSubprocedurePool java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:324) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.MetricsTableSourceImpl.updateFlushTime(MetricsTableSourceImpl.java:375) at org.apache.hadoop.hbase.regionserver.MetricsTable.updateFlushTime(MetricsTable.java:56) at org.apache.hadoop.hbase.regionserver.MetricsRegionServer.updateFlush(MetricsRegionServer.java:210) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2826) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2444) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2416) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2306) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2209) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:115) at org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77) {code} In MetricsTableAggregateSourceImpl.getOrCreateTableSource : {code} MetricsTableSource prev = tableSources.putIfAbsent(table, source); if (prev != null) { return prev; } else { // register the new metrics now register(source); {code} Suppose threads t1 and t2 execute the above code concurrently. t1 calls putIfAbsent first and proceeds to running {{register(source)}}. Context switches, t2 gets to putIfAbsent and retrieves the instance stored by t1 which is not registered yet. We would end up with what the stack trace showed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21138) Close HRegion instance at the end of every test in TestHRegion
Ted Yu created HBASE-21138: -- Summary: Close HRegion instance at the end of every test in TestHRegion Key: HBASE-21138 URL: https://issues.apache.org/jira/browse/HBASE-21138 Project: HBase Issue Type: Test Reporter: Ted Yu TestHRegion has over 100 tests. The following is from one subtest: {code} public void testCompactionAffectedByScanners() throws Exception { byte[] family = Bytes.toBytes("family"); this.region = initHRegion(tableName, method, CONF, family); {code} this.region is not closed at the end of the subtest. testToShowNPEOnRegionScannerReseek is another example. Every subtest should use the following construct toward the end: {code} } finally { HBaseTestingUtility.closeRegionAndWAL(this.region); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-14783) Proc-V2: Master aborts when downgrading from 1.3 to 1.1
[ https://issues.apache.org/jira/browse/HBASE-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-14783. Resolution: Later > Proc-V2: Master aborts when downgrading from 1.3 to 1.1 > --- > > Key: HBASE-14783 > URL: https://issues.apache.org/jira/browse/HBASE-14783 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Stephen Yuan Jiang >Priority: Major > > I was running ITBLL with 1.3 deployed on a 6 node cluster. > Then I stopped the cluster, deployed 1.1 release and tried to start cluster. > However, master failed to start due to: > {code} > 2015-11-06 00:58:40,351 FATAL [eval-test-2:2.activeMasterManager] > master.HMaster: Failed to become active master > java.io.IOException: The procedure class > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure must be > accessible and have an empty constructor > at > org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:548) > at org.apache.hadoop.hbase.procedure2.Procedure.convert(Procedure.java:640) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.read(ProcedureWALFormatReader.java:105) > at > org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:82) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:298) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:275) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:434) > at > org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1208) > at > org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1107) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:694) > at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:186) > at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1713) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.hadoop.hbase.procedure2.Procedure.newInstance(Procedure.java:536) > ... 12 more > {code} > The cause was that ServerCrashProcedure, written in some WAL file under > MasterProcWALs from first run, was absent in 1.1 release. > After a brief discussion with Stephen, I am logging this JIRA to solicit > discussion on how customer experience can be improved if downgrade of hbase > is performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-14716) Detection of orphaned table znode should cover table in Enabled state
[ https://issues.apache.org/jira/browse/HBASE-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-14716. Resolution: Later > Detection of orphaned table znode should cover table in Enabled state > - > > Key: HBASE-14716 > URL: https://issues.apache.org/jira/browse/HBASE-14716 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Labels: hbck > Attachments: 14716-branch-1-v1.txt, 14716.branch-1.v4.txt > > > HBASE-12070 introduced fix for orphaned table znode where table doesn't have > entry in hbase:meta > When Stephen and I investigated rolling upgrade failure, > {code} > 2015-10-27 18:21:10,668 WARN [ProcedureExecutorThread-3] > procedure.CreateTableProcedure: The table smoketest does not exist in meta > but has a znode. run hbck to fix inconsistencies. > {code} > we found that the orphaned table znode corresponded to table in Enabled state. > Therefore running hbck didn't report the inconsistency. > Detection for orphaned table znode should cover this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21097) Flush pressure assertion may fail in testFlushThroughputTuning
Ted Yu created HBASE-21097: -- Summary: Flush pressure assertion may fail in testFlushThroughputTuning Key: HBASE-21097 URL: https://issues.apache.org/jira/browse/HBASE-21097 Project: HBase Issue Type: Test Reporter: Ted Yu >From >https://builds.apache.org/job/PreCommit-HBASE-Build/14137/artifact/patchprocess/patch-unit-hbase-server.txt > : {code} [ERROR] testFlushThroughputTuning(org.apache.hadoop.hbase.regionserver.throttle.TestFlushWithThroughputController) Time elapsed: 17.446 s <<< FAILURE! java.lang.AssertionError: expected:<0.0> but was:<1.2906294173808417E-6> at org.apache.hadoop.hbase.regionserver.throttle.TestFlushWithThroughputController.testFlushThroughputTuning(TestFlushWithThroughputController.java:185) {code} Here is the related assertion: {code} assertEquals(0.0, regionServer.getFlushPressure(), EPSILON); {code} where EPSILON = 1E-6 In the above case, due to margin of 2.9E-7, the assertion didn't pass. It seems the epsilon can be adjusted to accommodate different workload / hardware combination. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21088) HStoreFile should be closed in HStore#hasReferences
Ted Yu created HBASE-21088: -- Summary: HStoreFile should be closed in HStore#hasReferences Key: HBASE-21088 URL: https://issues.apache.org/jira/browse/HBASE-21088 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu {code} reloadedStoreFiles = loadStoreFiles(); return StoreUtils.hasReferences(reloadedStoreFiles); {code} The intention of obtaining the HStoreFile's is to check for references. The loaded HStoreFile's should be closed prior to return to prevent leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21076) TestTableResource fails with NPE
Ted Yu created HBASE-21076: -- Summary: TestTableResource fails with NPE Key: HBASE-21076 URL: https://issues.apache.org/jira/browse/HBASE-21076 Project: HBase Issue Type: Test Reporter: Ted Yu The following can be observed in master branch: {code} java.lang.NullPointerException at org.apache.hadoop.hbase.rest.TestTableResource.setUpBeforeClass(TestTableResource.java:134) {code} The NPE comes from the following in TestEndToEndSplitTransaction : {code} compactAndBlockUntilDone(TEST_UTIL.getAdmin(), TEST_UTIL.getMiniHBaseCluster().getRegionServer(0), daughterA.getRegionName()); {code} Initial check of the code shows that TestEndToEndSplitTransaction uses TEST_UTIL instance which is created within TestEndToEndSplitTransaction. However, TestTableResource creates its own instance of HBaseTestingUtility. Meaning TEST_UTIL.getMiniHBaseCluster() would return null, since the instance created by TestEndToEndSplitTransaction has hbaseCluster as null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21042) processor.getRowsToLock() always assumes there is some row being locked in HRegion#processRowsWithLocks
Ted Yu created HBASE-21042: -- Summary: processor.getRowsToLock() always assumes there is some row being locked in HRegion#processRowsWithLocks Key: HBASE-21042 URL: https://issues.apache.org/jira/browse/HBASE-21042 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu [~tdsilva] reported at the tail of HBASE-18998 that the fix for HBASE-18998 missed finally block of HRegion#processRowsWithLocks This is to fix that remaining call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21040) printStackTrace() is used in RestoreDriver in case Exception is caught
Ted Yu created HBASE-21040: -- Summary: printStackTrace() is used in RestoreDriver in case Exception is caught Key: HBASE-21040 URL: https://issues.apache.org/jira/browse/HBASE-21040 Project: HBase Issue Type: Bug Reporter: Ted Yu Here is related code: {code} } catch (Exception e) { e.printStackTrace(); {code} The correct way of logging stack trace is to use the Logger instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20988) TestShell shouldn't be skipped for hbase-shell module test
Ted Yu created HBASE-20988: -- Summary: TestShell shouldn't be skipped for hbase-shell module test Key: HBASE-20988 URL: https://issues.apache.org/jira/browse/HBASE-20988 Project: HBase Issue Type: Test Reporter: Ted Yu Here is snippet for QA run 13862 for HBASE-20985 : {code} 13:42:50 cd /testptch/hbase/hbase-shell 13:42:50 /usr/share/maven/bin/mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hbase-master-patch-1 -DHBasePatchProcess -PrunAllTests -Dtest.exclude.pattern=**/master.normalizer. TestSimpleRegionNormalizerOnCluster.java,**/replication.regionserver.TestSerialReplicationEndpoint.java,**/master.procedure.TestServerCrashProcedure.java,**/master.procedure.TestCreateTableProcedure. java,**/TestClientOperationTimeout.java,**/client.TestSnapshotFromClientWithRegionReplicas.java,**/master.TestAssignmentManagerMetrics.java,**/client.TestShell.java,**/client. TestCloneSnapshotFromClientWithRegionReplicas.java,**/master.TestDLSFSHLog.java,**/replication.TestReplicationSmallTestsSync.java,**/master.procedure.TestModifyTableProcedure.java,**/regionserver. TestCompactionInDeadRegionServer.java,**/client.TestFromClientSide3.java,**/master.procedure.TestRestoreSnapshotProcedure.java,**/client.TestRestoreSnapshotFromClient.java,**/security.access. TestCoprocessorWhitelistMasterObserver.java,**/replication.regionserver.TestDrainReplicationQueuesForStandBy.java,**/master.procedure.TestProcedurePriority.java,**/master.locking.TestLockProcedure. java,**/master.cleaner.TestSnapshotFromMaster.java,**/master.assignment.TestSplitTableRegionProcedure.java,**/client.TestMobRestoreSnapshotFromClient.java,**/replication.TestReplicationKillSlaveRS. java,**/regionserver.TestHRegion.java,**/security.access.TestAccessController.java,**/master.procedure.TestTruncateTableProcedure.java,**/client.TestAsyncReplicationAdminApiWithClusters.java,**/ coprocessor.TestMetaTableMetrics.java,**/client.TestMobSnapshotCloneIndependence.java,**/namespace.TestNamespaceAuditor.java,**/master.TestMasterAbortAndRSGotKilled.java,**/client.TestAsyncTable.java,**/master.TestMasterOperationsForRegionReplicas.java,**/util.TestFromClientSide3WoUnsafe.java,**/client.TestSnapshotCloneIndependence.java,**/client.TestAsyncDecommissionAdminApi.java,**/client. TestRestoreSnapshotFromClientWithRegionReplicas.java,**/master.assignment.TestMasterAbortWhileMergingTable.java,**/client.TestFromClientSide.java,**/client.TestAdmin1.java,**/client. TestFromClientSideWithCoprocessor.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/master.procedure.TestMasterFailoverWithProcedures.java,**/regionserver. TestSplitTransactionOnCluster.java clean test -fae > /testptch/patchprocess/patch-unit-hbase-shell.txt 2>&1 {code} In this case, there was modification to shell script, leading to running shell tests. However, TestShell was excluded in the QA run, defeating the purpose. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20968) list_procedures_test fails due to no matching regex
Ted Yu created HBASE-20968: -- Summary: list_procedures_test fails due to no matching regex Key: HBASE-20968 URL: https://issues.apache.org/jira/browse/HBASE-20968 Project: HBase Issue Type: Test Reporter: Ted Yu >From test output against hadoop3: {code} 2018-07-28 12:04:24,838 DEBUG [Time-limited test] procedure2.ProcedureExecutor(948): Stored pid=12, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.client.procedure. ShellTestProcedure 2018-07-28 12:04:24,864 INFO [RS-EventLoopGroup-1-3] ipc.ServerRpcConnection(556): Connection from 172.18.128.12:46918, version=3.0.0-SNAPSHOT, sasl=false, ugi=hbase (auth: SIMPLE), service=MasterService 2018-07-28 12:04:24,900 DEBUG [Thread-114] master.MasterRpcServices(1157): Checking to see if procedure is done pid=11 ^[[38;5;196mF^[[0m === Failure: ^[[48;5;124;38;5;231;1mtest_list_procedures(Hbase::ListProceduresTest)^[[0m src/test/ruby/shell/list_procedures_test.rb:65:in `block in test_list_procedures' 62: end 63: end 64: ^[[48;5;124;38;5;231;1m => 65: assert_equal(1, matching_lines)^[[0m 66: end 67: end 68: end <^[[48;5;34;38;5;231;1m1^[[0m> expected but was <^[[48;5;124;38;5;231;1m0^[[0m> === ... 2018-07-28 12:04:25,374 INFO [PEWorker-9] procedure2.ProcedureExecutor(1316): Finished pid=12, state=SUCCESS, hasLock=false; org.apache.hadoop.hbase.client.procedure. ShellTestProcedure in 336msec {code} The completion of the ShellTestProcedure was after the assertion was raised. {code} def create_procedure_regexp(table_name) regexp_string = '[0-9]+ .*ShellTestProcedure SUCCESS.*' \ {code} The regex used by the test isn't found in test output either. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20966) RestoreTool#getTableInfoPath should look for completed snapshot only
Ted Yu created HBASE-20966: -- Summary: RestoreTool#getTableInfoPath should look for completed snapshot only Key: HBASE-20966 URL: https://issues.apache.org/jira/browse/HBASE-20966 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu [~gubjanos] reported seeing the following error when running backup / restore test on Azure: {code} 2018-07-25 17:03:56,661|INFO|MainThread|machine.py:167 - run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:wasb://hbase3-m30wub1711kond-115...@humbtesting8wua.blob.core.windows.net/user/hbase/backup_loc/backup_1532538064246/default/table_fnfawii1za/.hbase-snapshot/.tmp/. snapshotinfo 2018-07-25 17:03:56,661|INFO|MainThread|machine.py:167 - run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:328) 2018-07-25 17:03:56,661|INFO|MainThread|machine.py:167 - run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at org.apache.hadoop.hbase.backup.util.RestoreServerUtil.getTableDesc(RestoreServerUtil.java:237) 2018-07-25 17:03:56,662|INFO|MainThread|machine.py:167 - run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at org.apache.hadoop.hbase.backup.util.RestoreServerUtil.restoreTableAndCreate(RestoreServerUtil.java:351) 2018-07-25 17:03:56,662|INFO|MainThread|machine.py:167 - run()||GUID=e7de7672-ebfd-402d-8f1f-68e7e8444cb1|at org.apache.hadoop.hbase.backup.util.RestoreServerUtil.fullRestoreTable(RestoreServerUtil.java:186) {code} Here is related code in master branch: {code} Path getTableInfoPath(TableName tableName) throws IOException { Path tableSnapShotPath = getTableSnapshotPath(backupRootPath, tableName, backupId); Path tableInfoPath = null; // can't build the path directly as the timestamp values are different FileStatus[] snapshots = fs.listStatus(tableSnapShotPath); {code} In the above code, we don't exclude incomplete snapshot, leading to exception later when reading snapshot info. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20917) MetaTableMetrics#stop references uninitialized requestsMap for non-meta region
Ted Yu created HBASE-20917: -- Summary: MetaTableMetrics#stop references uninitialized requestsMap for non-meta region Key: HBASE-20917 URL: https://issues.apache.org/jira/browse/HBASE-20917 Project: HBase Issue Type: Bug Reporter: Ted Yu I noticed the following in test output: {code} 2018-07-21 15:54:43,181 ERROR [RS_CLOSE_REGION-regionserver/172.17.5.4:0-1] executor.EventHandler(186): Caught throwable while processing event M_RS_CLOSE_REGION java.lang.NullPointerException at org.apache.hadoop.hbase.coprocessor.MetaTableMetrics.stop(MetaTableMetrics.java:329) at org.apache.hadoop.hbase.coprocessor.BaseEnvironment.shutdown(BaseEnvironment.java:91) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionEnvironment.shutdown(RegionCoprocessorHost.java:165) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.shutdown(CoprocessorHost.java:290) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.postEnvCall(RegionCoprocessorHost.java:559) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:622) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postClose(RegionCoprocessorHost.java:551) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1678) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1484) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:104) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) {code} {{requestsMap}} is only initialized for the meta region. However, check for meta region is absent in the stop method: {code} public void stop(CoprocessorEnvironment e) throws IOException { // since meta region can move around, clear stale metrics when stop. for (String meterName : requestsMap.keySet()) { {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20892) [UI] Start / End keys are empty on table.jsp
Ted Yu created HBASE-20892: -- Summary: [UI] Start / End keys are empty on table.jsp Key: HBASE-20892 URL: https://issues.apache.org/jira/browse/HBASE-20892 Project: HBase Issue Type: Bug Affects Versions: 2.0.1 Reporter: Ted Yu When viewing table.jsp?name=TestTable , I found that the Start / End keys for all the regions were simply dashes without real value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20879) Compacting memstore config should handle lower case
Ted Yu created HBASE-20879: -- Summary: Compacting memstore config should handle lower case Key: HBASE-20879 URL: https://issues.apache.org/jira/browse/HBASE-20879 Project: HBase Issue Type: Bug Affects Versions: 2.0.1 Reporter: Tushar Sharma Assignee: Ted Yu Tushar reported seeing the following in region server log when entering 'basic' for compacting memstore type: {code} 2018-07-10 19:43:45,944 ERROR [RS_OPEN_REGION-regionserver/c01s22:16020-0] handler.OpenRegionHandler: Failed open of region=usertable,user6379,1531182972304.69abd81a44e9cc3ef9e150709f4f69ab., starting to roll back the global memstore size. java.io.IOException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hbase.MemoryCompactionPolicy.basic at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1035) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:900) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:872) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7048) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7006) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6977) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6933) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6884) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hbase.MemoryCompactionPolicy.basic at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hbase.MemoryCompactionPolicy.valueOf(MemoryCompactionPolicy.java:26) at org.apache.hadoop.hbase.regionserver.HStore.getMemstore(HStore.java:331) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:271) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5531) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:999) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:996) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more 2018-07-10 19:43:45,944 ERROR [RS_OPEN_REGION-regionserver/c01s22:16020-1] handler.OpenRegionHandler: Failed open of region=temp,,1530511278693.0be48eedc68b9358aa475946d00571f1., starting to roll back the global memstore size. java.io.IOException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hbase.MemoryCompactionPolicy.basic at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1035) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:900) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:872) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7048) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7006) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6977) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6933) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6884) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hbase.MemoryCompactionPolicy.basic at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hbase.MemoryCompactionPolicy.valueOf(MemoryCompactionPolicy.java:26) at
[jira] [Created] (HBASE-20744) Address FindBugs warnings in branch-1
Ted Yu created HBASE-20744: -- Summary: Address FindBugs warnings in branch-1 Key: HBASE-20744 URL: https://issues.apache.org/jira/browse/HBASE-20744 Project: HBase Issue Type: Bug Reporter: Ted Yu >From >https://builds.apache.org/job/HBase%20Nightly/job/branch-1/350//JDK8_Nightly_Build_Report_(Hadoop2)/ > : {code} FindBugsmodule:hbase-common Inconsistent synchronization of org.apache.hadoop.hbase.io.encoding.EncodedDataBlock$BufferGrabbingByteArrayOutputStream.ourBytes; locked 50% of time Unsynchronized access at EncodedDataBlock.java:50% of time Unsynchronized access at EncodedDataBlock.java:[line 258] {code} {code} FindBugsmodule:hbase-hadoop2-compat java.util.concurrent.ScheduledThreadPoolExecutor stored into non-transient field MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:MetricsExecutorImpl$ExecutorSingleton.scheduler At MetricsExecutorImpl.java:[line 51] {code} {code} FindBugsmodule:hbase-server instanceof will always return false in org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, int, int), since a org.apache.hadoop.hbase.quotas.RpcThrottlingException can't be a org.apache.hadoop.hbase.quotas.ThrottlingException At RegionServerQuotaManager.java:in org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, int, int), since a org.apache.hadoop.hbase.quotas.RpcThrottlingException can't be a org.apache.hadoop.hbase.quotas.ThrottlingException At RegionServerQuotaManager.java:[line 193] instanceof will always return true for all non-null values in org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, int, int), since all org.apache.hadoop.hbase.quotas.RpcThrottlingException are instances of org.apache.hadoop.hbase.quotas.RpcThrottlingException At RegionServerQuotaManager.java:for all non-null values in org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(Region, int, int, int), since all org.apache.hadoop.hbase.quotas.RpcThrottlingException are instances of org.apache.hadoop.hbase.quotas.RpcThrottlingException At RegionServerQuotaManager.java:[line 199] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20743) ASF License warnings for branch-1
Ted Yu created HBASE-20743: -- Summary: ASF License warnings for branch-1 Key: HBASE-20743 URL: https://issues.apache.org/jira/browse/HBASE-20743 Project: HBase Issue Type: Bug Reporter: Ted Yu >From >https://builds.apache.org/job/HBase%20Nightly/job/branch-1/350/artifact/output-general/patch-asflicense-problems.txt > : {code} Lines that start with ? in the ASF License report indicate files that do not have an Apache license header: !? hbase-error-prone/target/checkstyle-result.xml !? hbase-error-prone/target/classes/META-INF/services/com.google.errorprone.bugpatterns.BugChecker !? hbase-error-prone/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst !? hbase-error-prone/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst {code} Looks like they should be excluded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir
Ted Yu created HBASE-20734: -- Summary: Colocate recovered edits directory with hbase.wal.dir Key: HBASE-20734 URL: https://issues.apache.org/jira/browse/HBASE-20734 Project: HBase Issue Type: Improvement Reporter: Ted Yu During investigation of HBASE-20723, I realized that we wouldn't get the best performance when hbase.wal.dir is configured to be on different (fast) media than hbase rootdir w.r.t. recovered edits since recovered edits directory is currently under rootdir. Such setup may not result in fast recovery when there is region server failover. This issue is to find proper (hopefully backward compatible) way in colocating recovered edits directory with hbase.wal.dir . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20672) Create new HBase metrics ReadRequestRate and WriteRequestRate that reset at every monitoring interval
[ https://issues.apache.org/jira/browse/HBASE-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-20672: > Create new HBase metrics ReadRequestRate and WriteRequestRate that reset at > every monitoring interval > - > > Key: HBASE-20672 > URL: https://issues.apache.org/jira/browse/HBASE-20672 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Ankit Jain >Assignee: Ankit Jain >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-20672.branch-1.001.patch, > HBASE-20672.master.001.patch, HBASE-20672.master.002.patch, > HBASE-20672.master.003.patch, hits1vs2.4.40.400.png > > > Hbase currently provides counter read/write requests (ReadRequestCount, > WriteRequestCount). That said it is not easy to use counter that reset only > after a restart of the service, we would like to expose 2 new metrics in > HBase to provide ReadRequestRate and WriteRequestRate at region server level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20577) Make Log Level page design consistent with the design of other pages in UI
[ https://issues.apache.org/jira/browse/HBASE-20577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20577. Resolution: Fixed Thanks for the addendum > Make Log Level page design consistent with the design of other pages in UI > -- > > Key: HBASE-20577 > URL: https://issues.apache.org/jira/browse/HBASE-20577 > Project: HBase > Issue Type: Improvement > Components: UI, Usability >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-20577.master.001.patch, > HBASE-20577.master.002.patch, HBASE-20577.master.ADDENDUM.patch, > after_patch_LogLevel_CLI.png, after_patch_get_log_level.png, > after_patch_require_field_validation.png, after_patch_set_log_level_bad.png, > after_patch_set_log_level_success.png, > before_patch_no_validation_required_field.png, rest_after_addendum_patch.png > > > The Log Level page in web UI seems out of the place. I think we should make > it look consistent with design of other pages in HBase web UI. > Also, validation of required fields should be done, otherwise user should not > be allowed to click submit button. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20690) Moving table to target rsgroup needs to handle TableStateNotFoundException
Ted Yu created HBASE-20690: -- Summary: Moving table to target rsgroup needs to handle TableStateNotFoundException Key: HBASE-20690 URL: https://issues.apache.org/jira/browse/HBASE-20690 Project: HBase Issue Type: Bug Reporter: Ted Yu This is related code: {code} if (targetGroup != null) { for (TableName table: tables) { if (master.getAssignmentManager().isTableDisabled(table)) { LOG.debug("Skipping move regions because the table" + table + " is disabled."); continue; } {code} In a stack trace [~rmani] showed me: {code} 2018-06-06 07:10:44,893 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] master.TableStateManager: Unable to get table demo:tbl1 state org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: demo:tbl1 at org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:193) at org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:143) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:346) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveTables(RSGroupAdminServer.java:407) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.assignTableToGroup(RSGroupAdminEndpoint.java:447) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postCreateTable(RSGroupAdminEndpoint.java:470) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:334) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$12.call(MasterCoprocessorHost.java:331) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postCreateTable(MasterCoprocessorHost.java:331) at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1768) at org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1750) at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:593) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) {code} The logic should take potential TableStateNotFoundException into account. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20680) Master hung during initialization waiting on hbase:meta to be assigned which never does
Ted Yu created HBASE-20680: -- Summary: Master hung during initialization waiting on hbase:meta to be assigned which never does Key: HBASE-20680 URL: https://issues.apache.org/jira/browse/HBASE-20680 Project: HBase Issue Type: Bug Reporter: Josh Elser When running IntegrationTestRSGroups, the test became hung waiting on the master to be initialized. The hbase cluster was launched without RSGroup config. The test script adds required RSGroup configs to hbase-site.xml and restarts the cluster. It seems that, at one point while the master was trying to assign meta, the destination regionserver was in the middle of going down. This has now left HBase in a state where it starts the regionserver recovery procedures, but never actually gets hbase:meta assigned. {code} 2018-06-01 10:47:50,024 INFO [PEWorker-5] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740}] 2018-06-01 10:47:50,026 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore: hsync completed for hdfs://ctr-e138-1518143905142-340983-03-14.hwx.site:8020/apps/hbase/data/ MasterProcWALs/pv2-0002.log 2018-06-01 10:47:50,026 INFO [PEWorker-3] procedure.MasterProcedureScheduler: pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740 checking lock on 1588230740 2018-06-01 10:47:50,026 DEBUG [PEWorker-3] assignment.RegionStates: setting location=ctr-e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190 for rit=OFFLINE, location=ctr- e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190, table=hbase:meta, region=1588230740 last loc=null 2018-06-01 10:47:50,026 INFO [PEWorker-3] assignment.AssignProcedure: Starting pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta,region=1588230740; rit=OFFLINE, location=ctr-e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190; forceNewPlan=false, retain=true target svr=null {code} At Fri Jun 1 10:48:04, master was restarted. The new master picked up pid=41: {code} 2018-06-01 10:48:47,971 INFO [PEWorker-1] assignment.AssignProcedure: Starting pid=41, ppid=40, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta,region=1588230740; rit=OFFLINE, location=null; forceNewPlan=false, retain=false target svr=null {code} There was no further log for pid=41 after above. Later when master initiated another meta recovery procedure (pid=42), the second procedure seems to be locked out by the former: {code} 2018-06-01 10:49:34,292 INFO [PEWorker-2] procedure.MasterProcedureScheduler: pid=43, ppid=42, state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, region=1588230740, target=ctr-e138-1518143905142-340983-03-14.hwx.site,16020,1527849994190 checking lock on 1588230740 2018-06-01 10:49:34,293 DEBUG [PEWorker-2] assignment.RegionTransitionProcedure: LOCK_EVENT_WAIT pid=43 serverLocks={}, namespaceLocks={}, tableLocks={}, regionLocks={{1588230740=exclusiveLockOwner=41, sharedLockCount=0, waitingProcCount=1}}, peerLocks={} {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20677) Backport HBASE-20566 'Creating a system table after enabling rsgroup feature puts region into RIT ' to branch-2
Ted Yu created HBASE-20677: -- Summary: Backport HBASE-20566 'Creating a system table after enabling rsgroup feature puts region into RIT ' to branch-2 Key: HBASE-20677 URL: https://issues.apache.org/jira/browse/HBASE-20677 Project: HBase Issue Type: Task Reporter: Ted Yu After HBASE-20566 was integrated into master, HBASE-20595 removed the concept of 'special tables' from rsgroups. This task is to backport the fix to branch-2. TestRSGroups#testRSGroupsWithHBaseQuota would be added. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20676) Give .hbase-snapshot proper ownership upon directory creation
Ted Yu created HBASE-20676: -- Summary: Give .hbase-snapshot proper ownership upon directory creation Key: HBASE-20676 URL: https://issues.apache.org/jira/browse/HBASE-20676 Project: HBase Issue Type: Task Reporter: Ted Yu This is continuation of the discussion over HBASE-20668. Tthe .hbase-snapshot directory is not created at cluster startup. Normally it is created when snapshot operation is initiated. However, if before any snapshot operation is performed, some non-super user from another cluster conducts ExportSnapshot to this cluster, the .hbase-snapshot directory would be created as that user. (This is just one scenario that can lead to wrong ownership) This JIRA is to seek proper way(s) to ensure that .hbase-snapshot directory would always carry proper onwership and permission upon creation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20668) Exception from FileSystem operation in finally block of ExportSnapshot#doWork may hide exception from FileUtil.copy call
Ted Yu created HBASE-20668: -- Summary: Exception from FileSystem operation in finally block of ExportSnapshot#doWork may hide exception from FileUtil.copy call Key: HBASE-20668 URL: https://issues.apache.org/jira/browse/HBASE-20668 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu I was debugging the following error [~romil.choksi] saw during testing ExportSnapshot : {code} 2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 - run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|2018-06-01 02:40:52,358 ERROR [main] util.AbstractHBaseTool: Error running command-line tool 2018-06-01 02:40:52,363|INFO|MainThread|machine.py:167 - run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|java.io.FileNotFoundException: Directory/File does not exist /apps/ hbase/data/.hbase-snapshot/.tmp/snapshot_table_334546 2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 - run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at org.apache.hadoop.hdfs.server.namenode.FSDirectory. checkOwner(FSDirectory.java:1777) 2018-06-01 02:40:52,364|INFO|MainThread|machine.py:167 - run()||GUID=1cacb7bc-f7cc-4710-82e0-4a4513f0c1f9|at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp. setOwner(FSDirAttrOp.java:82) {code} Here is corresponding code (with extra log added): {code} try { LOG.info("Copy Snapshot Manifest from " + snapshotDir + " to " + initialOutputSnapshotDir); boolean ret = FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, false, false, conf); LOG.info("return val = " + ret); } catch (IOException e) { LOG.warn("Failed to copy the snapshot directory: from=" + snapshotDir + " to=" + initialOutputSnapshotDir, e); throw new ExportSnapshotException("Failed to copy the snapshot directory: from=" + snapshotDir + " to=" + initialOutputSnapshotDir, e); } finally { if (filesUser != null || filesGroup != null) { LOG.warn((filesUser == null ? "" : "Change the owner of " + needSetOwnerDir + " to " + filesUser) + (filesGroup == null ? "" : ", Change the group of " + needSetOwnerDir + " to " + filesGroup)); setOwner(outputFs, needSetOwnerDir, filesUser, filesGroup, true); } {code} "return val = " was not seen in rerun of the test. This is what the additional log revealed: {code} 2018-06-01 09:22:54,247|INFO|MainThread|machine.py:167 - run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|2018-06-01 09:22:54,241 WARN [main] snapshot.ExportSnapshot: Failed to copy the snapshot directory: from=hdfs://ns1/apps/hbase/data/.hbase-snapshot/snapshot_table_157842 to=hdfs://ns3/apps/hbase/data/.hbase-snapshot/.tmp/snapshot_table_157842 2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 - run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=WRITE, inode="/apps/hbase/data/.hbase-snapshot/.tmp":hrt_qa:hadoop:drx-wT 2018-06-01 09:22:54,248|INFO|MainThread|machine.py:167 - run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. check(FSPermissionChecker.java:399) 2018-06-01 09:22:54,249|INFO|MainThread|machine.py:167 - run()||GUID=3961d249-9981-429d-81a8-39c7df53cf58|at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker. checkPermission(FSPermissionChecker.java:255) {code} It turned out that the exception from {{setOwner}} call in the finally block eclipsed the real exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20639) Implement permission checking through AccessController instead of RSGroupAdminEndpoint
[ https://issues.apache.org/jira/browse/HBASE-20639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-20639: > Implement permission checking through AccessController instead of > RSGroupAdminEndpoint > -- > > Key: HBASE-20639 > URL: https://issues.apache.org/jira/browse/HBASE-20639 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Nihal Jain >Priority: Major > Attachments: HBASE-20639.master.001.patch, > HBASE-20639.master.002.patch, HBASE-20639.master.002.patch > > > Currently permission checking for various RS group operations is done via > RSGroupAdminEndpoint. > e.g. in RSGroupAdminServiceImpl#moveServers() : > {code} > checkPermission("moveServers"); > groupAdminServer.moveServers(hostPorts, request.getTargetGroup()); > {code} > The practice in remaining parts of hbase is to perform permission checking > within AccessController. > Now that observer hooks for RS group operations are in right place, we should > follow best practice and move permission checking to AccessController. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20654) Expose regions in transition thru JMX
Ted Yu created HBASE-20654: -- Summary: Expose regions in transition thru JMX Key: HBASE-20654 URL: https://issues.apache.org/jira/browse/HBASE-20654 Project: HBase Issue Type: Improvement Reporter: Ted Yu Currently only the count of regions in transition is exposed thru JMX. Here is a sample snippet of the /jmx output: {code} { "beans" : [ { ... }, { "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManager", "modelerType" : "Master,sub=AssignmentManager", "tag.Context" : "master", ... "ritCount" : 3 {code} It would be desirable to expose region name, state for the regions in transition as well. We can place configurable upper bound on the number of entries returned in case there're a lot of regions in transition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20653) Add missing observer hooks for region server group to MasterObserver
Ted Yu created HBASE-20653: -- Summary: Add missing observer hooks for region server group to MasterObserver Key: HBASE-20653 URL: https://issues.apache.org/jira/browse/HBASE-20653 Project: HBase Issue Type: Bug Reporter: Ted Yu Currently the following region server group operations don't have corresponding hook in MasterObserver : * getRSGroupInfo * getRSGroupInfoOfServer * getRSGroupInfoOfTable * listRSGroup This JIRA is to * add them to MasterObserver * add corresponding permission check in AccessController * move the {{checkPermission}} out of RSGroupAdminEndpoint * add corresponding tests to TestRSGroupsWithACL -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20079) Report all the new test classes missing HBaseClassTestRule in one patch
[ https://issues.apache.org/jira/browse/HBASE-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20079. Resolution: Later > Report all the new test classes missing HBaseClassTestRule in one patch > --- > > Key: HBASE-20079 > URL: https://issues.apache.org/jira/browse/HBASE-20079 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Trivial > > Currently if there are both new small and large tests without > HBaseClassTestRule in a single patch, the QA bot would report the small test > class as missing HBaseClassTestRule but not the large test. > All new test classes missing HBaseClassTestRule should be reported in the > same QA run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20081) TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown
[ https://issues.apache.org/jira/browse/HBASE-20081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20081. Resolution: Cannot Reproduce > TestDisableTableProcedure sometimes hung in MiniHBaseCluster#waitUntilShutDown > -- > > Key: HBASE-20081 > URL: https://issues.apache.org/jira/browse/HBASE-20081 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Priority: Major > > https://builds.apache.org/job/HBase-2.0-hadoop3-tests/lastCompletedBuild/org.apache.hbase$hbase-server/testReport/org.apache.hadoop.hbase.master.procedure/TestDisableTableProcedure/org_apache_hadoop_hbase_master_procedure_TestDisableTableProcedure/ > was one recent occurrence. > I noticed two things in test output: > {code} > 2018-02-25 18:12:45,053 WARN [Time-limited test-EventThread] > master.RegionServerTracker(136): asf912.gq1.ygridcore.net,45649,1519582305777 > is not online or isn't known to the master.The latter could be caused by a > DNS misconfiguration. > {code} > Since DNS misconfiguration was very unlikely on Apache Jenkins nodes, the > above should not have been logged. > {code} > 2018-02-25 18:16:51,531 WARN [master/asf912:0.Chore.1] > master.CatalogJanitor(127): Failed scan of catalog table > java.io.IOException: connection is closed > at > org.apache.hadoop.hbase.MetaTableAccessor.getMetaHTable(MetaTableAccessor.java:263) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:761) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMeta(MetaTableAccessor.java:680) > at > org.apache.hadoop.hbase.MetaTableAccessor.scanMetaForTableRegions(MetaTableAccessor.java:675) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:188) > at > org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:140) > at > org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:246) > at > org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:119) > at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186) > {code} > The above was possibly related to the lost region server. > I searched test output of successful run where none of the above two can be > seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20644) Master shutdown due to service ClusterSchemaServiceImpl failing to start
Ted Yu created HBASE-20644: -- Summary: Master shutdown due to service ClusterSchemaServiceImpl failing to start Key: HBASE-20644 URL: https://issues.apache.org/jira/browse/HBASE-20644 Project: HBase Issue Type: Bug Reporter: Romil Choksi >From hbase-hbase-master-ctr-e138-1518143905142-329221-01-03.hwx.site.log : {code} 2018-05-23 22:14:29,750 ERROR [master/ctr-e138-1518143905142-329221-01-03:2] master.HMaster: Failed to become active master java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345) at org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291) at org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1054) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:918) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2023) {code} Earlier in the log , the namespace region was deemed OPEN on 01-07.hwx.site,16020,1527112194788 which was declared not online: {code} 2018-05-23 21:54:34,786 INFO [master/ctr-e138-1518143905142-329221-01-03:2] assignment.RegionStateStore: Load hbase:meta entry region=01a7f9ba9fffd691f261d3fbc620da06, regionState=OPEN, lastHost=ctr-e138-1518143905142-329221-01-07.hwx.site,16020,1527112194788, regionLocation=ctr-e138-1518143905142-329221-01-07.hwx.site,16020,1527112194788, seqnum=43 2018-05-23 21:54:34,787 INFO [master/ctr-e138-1518143905142-329221-01-03:2] assignment.AssignmentManager: Number of RegionServers=1 2018-05-23 21:54:34,788 INFO [master/ctr-e138-1518143905142-329221-01-03:2] assignment.AssignmentManager: KILL RegionServer=ctr-e138-1518143905142-329221-01-07. hwx.site,16020,1527112194788 hosting regions but not online. {code} Later, even though a different instance on 007 registered with master: {code} 2018-05-23 21:55:13,541 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=2] master.ServerManager: Registering regionserver=ctr-e138-1518143905142-329221-01-07.hwx.site,16020,1527112506002 ... 2018-05-23 21:55:43,881 INFO [master/ctr-e138-1518143905142-329221-01-03:2] client.RpcRetryingCallerImpl: Call exception, tries=12, retries=12, started=69001 ms ago,cancelled=false, msg=org.apache.hadoop.hbase.NotServingRegionException: hbase:namespace,,1527099443383.01a7f9ba9fffd691f261d3fbc620da06. is not online on ctr-e138-1518143905142-329221- 01-07.hwx.site,16020,1527112506002 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3273) at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3250) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1414) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2446) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) {code} There was no OPEN request sent to that instance. >From >hbase-hbase-regionserver-ctr-e138-1518143905142-329221-01-07.hwx.site.log : {code} 2018-05-23 21:52:27,414 INFO [RS_CLOSE_REGION-regionserver/ctr-e138-1518143905142-329221-01-07:16020-1] regionserver.HRegion: Closed hbase:namespace,,1527099443383. 01a7f9ba9fffd691f261d3fbc620da06. {code} Then region server 007 restarted: {code} Wed May 23 21:55:03 UTC 2018 Starting regionserver on ctr-e138-1518143905142-329221-01-07.hwx.site {code} After which the region 01a7f9ba9fffd691f261d3fbc620da06 never showed up again in log 007 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20639) Implement permission checking through AccessController instead of RSGroupAdminEndpoint
Ted Yu created HBASE-20639: -- Summary: Implement permission checking through AccessController instead of RSGroupAdminEndpoint Key: HBASE-20639 URL: https://issues.apache.org/jira/browse/HBASE-20639 Project: HBase Issue Type: Bug Reporter: Ted Yu Currently permission checking for various RS group operations is done via RSGroupAdminEndpoint. e.g. in RSGroupAdminServiceImpl#moveServers() : {code} checkPermission("moveServers"); groupAdminServer.moveServers(hostPorts, request.getTargetGroup()); {code} The practice in remaining parts of hbase is to perform permission checking within AccessController. Now that observer hooks for RS group operations are in right place, we should follow best practice and move permission checking to AccessController. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20627) Relocate RS Group pre/post hooks from RSGroupAdminServer to RSGroupAdminEndpoint
[ https://issues.apache.org/jira/browse/HBASE-20627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-20627: > Relocate RS Group pre/post hooks from RSGroupAdminServer to > RSGroupAdminEndpoint > > > Key: HBASE-20627 > URL: https://issues.apache.org/jira/browse/HBASE-20627 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Major > Fix For: 2.1.0 > > Attachments: 20627.branch-1.txt, 20627.v1.txt, 20627.v2.txt, > 20627.v3.txt > > > Currently RS Group pre/post hooks are called from RSGroupAdminServer. > e.g. RSGroupAdminServer#removeRSGroup : > {code} > if (master.getMasterCoprocessorHost() != null) { > master.getMasterCoprocessorHost().preRemoveRSGroup(name); > } > {code} > RSGroupAdminServer#removeRSGroup is called by RSGroupAdminEndpoint : > {code} > checkPermission("removeRSGroup"); > groupAdminServer.removeRSGroup(request.getRSGroupName()); > {code} > If permission check fails, the pre hook wouldn't be called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20627) Relocate RS Group pre/post hooks from RSGroupAdminServer to RSGroupAdminEndpoint
Ted Yu created HBASE-20627: -- Summary: Relocate RS Group pre/post hooks from RSGroupAdminServer to RSGroupAdminEndpoint Key: HBASE-20627 URL: https://issues.apache.org/jira/browse/HBASE-20627 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Attachments: 20627.v1.txt Currently RS Group pre/post hooks are called from RSGroupAdminServer. e.g. RSGroupAdminServer#removeRSGroup : {code} if (master.getMasterCoprocessorHost() != null) { master.getMasterCoprocessorHost().preRemoveRSGroup(name); } {code} RSGroupAdminServer#removeRSGroup is called by RSGroupAdminEndpoint : {code} checkPermission("removeRSGroup"); groupAdminServer.removeRSGroup(request.getRSGroupName()); {code} If permission check fails, the pre hook wouldn't be called. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20609) SnapshotHFileCleaner#init should check that params is not null
Ted Yu created HBASE-20609: -- Summary: SnapshotHFileCleaner#init should check that params is not null Key: HBASE-20609 URL: https://issues.apache.org/jira/browse/HBASE-20609 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Noticed the following in the test output of TestHFileArchiving : {code} SnapshotHFileCleaner.init(Map) line: 79 HFileCleaner(CleanerChore).newFileCleaner(String, Configuration) line: 260 HFileCleaner(CleanerChore).initCleanerChain(String) line: 232 HFileCleaner(CleanerChore).(String, int, Stoppable, Configuration, FileSystem, Path, String, Map ) line: 182 HFileCleaner.(int, Stoppable, Configuration, FileSystem, Path, Map ) line: 104 HFileCleaner.(int, Stoppable, Configuration, FileSystem, Path) line: 51 TestHFileArchiving.testCleaningRace() line: 377 {code} This was due to SnapshotHFileCleaner#init not checking the parameter {{params}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20578) Support region server group in target cluster
Ted Yu created HBASE-20578: -- Summary: Support region server group in target cluster Key: HBASE-20578 URL: https://issues.apache.org/jira/browse/HBASE-20578 Project: HBase Issue Type: Sub-task Reporter: Ted Yu When source tables belong to non-default region server group(s) and there are region server group counterpart in the target cluster, we should support restoring to target cluster using the region server group mapping. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20552) HBase RegionServer was shutdown due to UnexpectedStateException
Ted Yu created HBASE-20552: -- Summary: HBase RegionServer was shutdown due to UnexpectedStateException Key: HBASE-20552 URL: https://issues.apache.org/jira/browse/HBASE-20552 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Romil Choksi This was observed during cluster testing (source code sync'ed with hbase-2.0, built May 2nd): {code} 2018-05-02 05:44:10,089 ERROR [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=2] master.MasterRpcServices: Region server ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 reported a fatal error: * ABORTING region server ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474: org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138- 1518143905142-279227-01-07.hwx.site,16020,1525239609353, table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138- 1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has otherwise. at org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: rit=OPEN, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has otherwise. at org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) ... 7 more * Cause: org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: rit=OPEN, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has otherwise. at org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1065) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.reportOnlineRegions(AssignmentManager.java:987) at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:459) at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:13118) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: org.apache.hadoop.hbase.exceptions.UnexpectedStateException: rit=OPEN, location=ctr-e138-1518143905142-279227-01-07.hwx.site,16020,1525239609353, table=test_hbase_ha_load_test_tool_hbase, region=94f6ca283dbb4445b2bcdc321b734d28reported OPEN on server=ctr-e138-1518143905142-279227-01-02.hwx.site,16020,1525239334474 but state has otherwise. at org.apache.hadoop.hbase.master.assignment.AssignmentManager.checkOnlineRegionsReport(AssignmentManager.java:1037) ... 7 more at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) {code} [~elserj] and I did some initial analysis. In the following description, M1 refers to master-ctr-e138-1518143905142-279227-01-05 and M2 refers to master-ctr-e138-1518143905142-279227-01-03. Let's follow region 94f6ca283dbb4445b2bcdc321b734d28 . Master 1 was moving the region to 07: {code} 2018-05-02 05:38:59,017 INFO [master/ctr-e138-1518143905142-279227-01-05:2.Chore.1] master.HMaster: balance hri=94f6ca283dbb4445b2bcdc321b734d28,
[jira] [Created] (HBASE-20530) Composition of backup directory incorrectly contains namespace when restoring
Ted Yu created HBASE-20530: -- Summary: Composition of backup directory incorrectly contains namespace when restoring Key: HBASE-20530 URL: https://issues.apache.org/jira/browse/HBASE-20530 Project: HBase Issue Type: Bug Reporter: Ted Yu Here is partial listing of output from incremental backup: {code} 5306 2018-05-04 02:38 hdfs://mycluster/user/hbase/backup_loc/backup_1525401467793/table_almphxih4u/cf1/5648501da7194783947bbf07b172f07e {code} When restoring, here is what HBackupFileSystem.getTableBackupDir returns: {code} fileBackupDir=hdfs://mycluster/user/hbase/backup_loc/backup_1525401467793/default/table_almphxih4u {code} You can see that namespace gets in the way, leading to inability of finding the proper hfile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20508) TestIncrementalBackupWithBulkLoad doesn't need to be Parameterized test
Ted Yu created HBASE-20508: -- Summary: TestIncrementalBackupWithBulkLoad doesn't need to be Parameterized test Key: HBASE-20508 URL: https://issues.apache.org/jira/browse/HBASE-20508 Project: HBase Issue Type: Test Components: backuprestore Reporter: Ted Yu TestIncrementalBackupWithBulkLoad currently is Parameterized with only one value returned from data() method. In its ctor, this value is ignored. TestIncrementalBackupWithBulkLoad doesn't need to be Parameterized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20495) REST unit test fails with NoClassDefFoundError against hadoop3
Ted Yu created HBASE-20495: -- Summary: REST unit test fails with NoClassDefFoundError against hadoop3 Key: HBASE-20495 URL: https://issues.apache.org/jira/browse/HBASE-20495 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu The following was first observed in the test output of rest.TestDeleteRow against hadoop3: {code} java.lang.NoClassDefFoundError: com/sun/jersey/core/spi/factory/AbstractRuntimeDelegate Caused by: java.lang.ClassNotFoundException: com.sun.jersey.core.spi.factory.AbstractRuntimeDelegate {code} This was due to the following transitive dependency on jersey 1.19: {code} [INFO] +- org.apache.hbase:hbase-testing-util:jar:2.0.0.3.0.0.0-SNAPSHOT:test [INFO] | +- org.apache.hbase:hbase-zookeeper:test-jar:tests:2.0.0.3.0.0.0-SNAPSHOT:test [INFO] | +- org.apache.hbase:hbase-hadoop-compat:test-jar:tests:2.0.0.3.0.0.0-SNAPSHOT:test [INFO] | +- org.apache.hbase:hbase-hadoop2-compat:test-jar:tests:2.0.0.3.0.0.0-SNAPSHOT:test [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:3.0.0:compile [INFO] | | \- org.apache.hadoop:hadoop-mapreduce-client-common:jar:3.0.0:compile [INFO] | +- org.apache.hadoop:hadoop-hdfs:test-jar:tests:3.0.0:test [INFO] | | \- com.sun.jersey:jersey-server:jar:1.19:compile {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20473) Ineffective INFO logging adjustment in HFilePerformanceEvaluation
[ https://issues.apache.org/jira/browse/HBASE-20473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20473. Resolution: Not A Problem > Ineffective INFO logging adjustment in HFilePerformanceEvaluation > - > > Key: HBASE-20473 > URL: https://issues.apache.org/jira/browse/HBASE-20473 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > // Disable verbose INFO logging from org.apache.hadoop.io.compress.CodecPool > static { > System.setProperty("org.apache.commons.logging.Log", > "org.apache.commons.logging.impl.SimpleLog"); > {code} > The above code has no effect since we're migrating away from commons-logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20473) Ineffective INFO logging adjustment in HFilePerformanceEvaluation
Ted Yu created HBASE-20473: -- Summary: Ineffective INFO logging adjustment in HFilePerformanceEvaluation Key: HBASE-20473 URL: https://issues.apache.org/jira/browse/HBASE-20473 Project: HBase Issue Type: Bug Reporter: Ted Yu {code} // Disable verbose INFO logging from org.apache.hadoop.io.compress.CodecPool static { System.setProperty("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.SimpleLog"); {code} The above code has no effect since we're migrating away from commons-logging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20436) IntegrationTestSparkBulkLoad cannot access abstract processOptions of AbstractHBaseTool
[ https://issues.apache.org/jira/browse/HBASE-20436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20436. Resolution: Not A Problem > IntegrationTestSparkBulkLoad cannot access abstract processOptions of > AbstractHBaseTool > --- > > Key: HBASE-20436 > URL: https://issues.apache.org/jira/browse/HBASE-20436 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Yu >Priority: Major > > Saw the following compilation error in hbase-spark-it module: > {code} > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /hbase/hbase-spark-it/src/test/java/org/apache/hadoop/hbase/spark/IntegrationTestSparkBulkLoad.java:[638,10] > abstract method > processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine) > in org.apache.hadoop.hbase.util.AbstractHBaseTool cannot be accessed directly > {code} > The processOptions method of AbstractHBaseTool is abstract. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20436) IntegrationTestSparkBulkLoad cannot access abstract processOptions of AbstractHBaseTool
Ted Yu created HBASE-20436: -- Summary: IntegrationTestSparkBulkLoad cannot access abstract processOptions of AbstractHBaseTool Key: HBASE-20436 URL: https://issues.apache.org/jira/browse/HBASE-20436 Project: HBase Issue Type: Bug Components: spark Reporter: Ted Yu Saw the following compilation error in hbase-spark-it module: {code} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /hbase/hbase-spark-it/src/test/java/org/apache/hadoop/hbase/spark/IntegrationTestSparkBulkLoad.java:[638,10] abstract method processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine) in org.apache.hadoop.hbase.util.AbstractHBaseTool cannot be accessed directly {code} The processOptions method of AbstractHBaseTool is abstract. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20414) TestLockProcedure#testMultipleLocks may fail on slow machine
Ted Yu created HBASE-20414: -- Summary: TestLockProcedure#testMultipleLocks may fail on slow machine Key: HBASE-20414 URL: https://issues.apache.org/jira/browse/HBASE-20414 Project: HBase Issue Type: Test Reporter: Ted Yu Here was recent failure : https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/172/testReport/junit/org.apache.hadoop.hbase.master.locking/TestLockProcedure/health_checks___yetus_jdk8_hadoop2_checks___testMultipleLocks/ {code} java.lang.AssertionError: expected: but was: at org.apache.hadoop.hbase.master.locking.TestLockProcedure.sendHeartbeatAndCheckLocked(TestLockProcedure.java:221) at org.apache.hadoop.hbase.master.locking.TestLockProcedure.testMultipleLocks(TestLockProcedure.java:311) {code} In the test output, we can see this: {code} 2018-04-13 20:19:54,230 DEBUG [Time-limited test] locking.TestLockProcedure(225): Proc id 22 : LOCKED. ... 2018-04-13 20:19:55,529 DEBUG [Time-limited test] procedure2.ProcedureExecutor(865): Stored pid=26, state=RUNNABLE; org.apache.hadoop.hbase.master.locking.LockProcedure regions=a7f9adfd047350eabb199a39564ba4db,c1e297609590b707233a2f9c8bb51fa1, type=EXCLUSIVE 2018-04-13 20:19:56,230 DEBUG [ProcExecTimeout] locking.LockProcedure(207): Timeout failure ProcedureEvent for pid=22, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE, ready=false, [pid=22, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.master.locking.LockProcedure, namespace=namespace, type=EXCLUSIVE] {code} After the pid=26 log, the code does this (1 second wait): {code} // Assert tables & region locks are waiting because of namespace lock. Thread.sleep(HEARTBEAT_TIMEOUT / 2); {code} On a slow machine (in the case above), there was only 730 msec between the queueing of regionsLock2 and the WAITING_TIMEOUT state of the nsLock. The 1 second wait was too long, leading to assertion failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20375) Remove use of getCurrentUserCredentials in hbase-spark module
Ted Yu created HBASE-20375: -- Summary: Remove use of getCurrentUserCredentials in hbase-spark module Key: HBASE-20375 URL: https://issues.apache.org/jira/browse/HBASE-20375 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu When compiling hbase-spark module against Spark 2.3.0 release, we would get: {code} [ERROR] /a/hbase/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:68: error: value getCurrentUserCredentials is not a member of org.apache.spark.deploy.SparkHadoopUtil [ERROR] @transient var credentials = SparkHadoopUtil.get.getCurrentUserCredentials() [ERROR]^ [ERROR] /a/hbase/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:236: error: value getCurrentUserCredentials is not a member of org.apache.spark.deploy. SparkHadoopUtil [ERROR] credentials = SparkHadoopUtil.get.getCurrentUserCredentials() [ERROR] ^ [ERROR] two errors found {code} {{getCurrentUserCredentials}} was removed by SPARK-22372. This issue is to replace the call to {{getCurrentUserCredentials}} with call to {{UserGroupInformation.getCurrentUser().getCredentials()}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20325) ReassignPartitionsClusterTest#shouldMoveSubsetOfPartitions is flaky
Ted Yu created HBASE-20325: -- Summary: ReassignPartitionsClusterTest#shouldMoveSubsetOfPartitions is flaky Key: HBASE-20325 URL: https://issues.apache.org/jira/browse/HBASE-20325 Project: HBase Issue Type: Test Reporter: Ted Yu Saw this from https://builds.apache.org/job/kafka-trunk-jdk8/2518/testReport/junit/kafka.admin/ReassignPartitionsClusterTest/shouldMoveSubsetOfPartitions/ : {code} kafka.common.AdminCommandFailedException: Partition reassignment currently in progress for Map(topic1-0 -> Buffer(100, 102), topic1-2 -> Buffer(100, 102), topic2-1 -> Buffer(101, 100), topic2-2 -> Buffer(100, 102)). Aborting operation at kafka.admin.ReassignPartitionsCommand.reassignPartitions(ReassignPartitionsCommand.scala:612) at kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:215) at kafka.admin.ReassignPartitionsClusterTest.shouldMoveSubsetOfPartitions(ReassignPartitionsClusterTest.scala:242) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20159) Support using separate ZK quorums for client
[ https://issues.apache.org/jira/browse/HBASE-20159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-20159: > Support using separate ZK quorums for client > > > Key: HBASE-20159 > URL: https://issues.apache.org/jira/browse/HBASE-20159 > Project: HBase > Issue Type: New Feature > Components: Client, Operability, Zookeeper >Reporter: Yu Li >Assignee: Yu Li >Priority: Major > Fix For: 3.0.0, 2.1.0 > > Attachments: 20159.addendum, 20159.addendum2.patch, > HBASE-20159.patch, HBASE-20159.v2.patch, HBASE-20159.v3.patch > > > Currently we are using the same zookeeper quorums for client and server, > which makes us under risk that if some client connection boost exhausted > zookeeper, RegionServer might abort due to zookeeper session loss. Actually > we have suffered from this many times in production. > Here we propose to allow client to use different ZK quorums, through below > settings: > {noformat} > hbase.client.zookeeper.quorum > hbase.client.zookeeper.property.clientPort > hbase.client.zookeeper.observer.mode > {noformat} > The first two are for specifying client zookeeper properties, and the third > one indicating whether the client ZK nodes are in observer mode. If the > client ZK are not observer nodes, HMaster will take responsibility to > synchronize necessary meta information (such as meta location and master > address, etc.) from server to client ZK -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20123) Backup test fails against hadoop 3
[ https://issues.apache.org/jira/browse/HBASE-20123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20123. Resolution: Duplicate Should be fixed by HADOOP-15289 > Backup test fails against hadoop 3 > -- > > Key: HBASE-20123 > URL: https://issues.apache.org/jira/browse/HBASE-20123 > Project: HBase > Issue Type: Bug >Reporter: Ted Yu >Priority: Major > > When running backup unit test against hadoop3, I saw: > {code} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 88.862 s <<< FAILURE! - in > org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes > [ERROR] > testBackupMultipleDeletes(org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes) > Time elapsed: 86.206 s <<< ERROR! > java.io.IOException: java.io.IOException: Failed copy from > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to > hdfs://localhost:40578/backupUT > at > org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82) > Caused by: java.io.IOException: Failed copy from > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to > hdfs://localhost:40578/backupUT > at > org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82) > {code} > In the test output, I found: > {code} > 2018-03-03 14:46:10,858 ERROR [Time-limited test] > mapreduce.MapReduceBackupCopyJob$BackupDistCp(237): java.io.IOException: Path > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic > link > java.io.IOException: Path > hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic > link > at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:338) > at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:461) > at > org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:155) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308) > at > org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:163) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91) > at > org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90) > at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) > at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.createInputFileListing(MapReduceBackupCopyJob.java:297) > at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) > at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.execute(MapReduceBackupCopyJob.java:196) > at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:408) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:348) > at > org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:290) > at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605) > {code} > It seems the failure was related to how we use distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20272) TestAsyncTable#testCheckAndMutateWithTimeRange fails due to TableExistsException
Ted Yu created HBASE-20272: -- Summary: TestAsyncTable#testCheckAndMutateWithTimeRange fails due to TableExistsException Key: HBASE-20272 URL: https://issues.apache.org/jira/browse/HBASE-20272 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu The following test failure is reproducible: {code} org.apache.hadoop.hbase.TableExistsException: testCheckAndMutateWithTimeRange at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.prepareCreate(CreateTableProcedure.java:233) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:87) at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.executeFromState(CreateTableProcedure.java:51) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1453) {code} The cause was that TestAsyncTable is parameterized while the testCheckAndMutateWithTimeRange uses the same table name without dropping the table after the first invocation finishes. This leads to second invocation failing with TableExistsException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20257) hbase-spark should not depend on com.google.code.findbugs.jsr305
Ted Yu created HBASE-20257: -- Summary: hbase-spark should not depend on com.google.code.findbugs.jsr305 Key: HBASE-20257 URL: https://issues.apache.org/jira/browse/HBASE-20257 Project: HBase Issue Type: Bug Reporter: Ted Yu The following can be observed in the build output of master branch: {code} [WARNING] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed with message: We don't allow the JSR305 jar from the Findbugs project, see HBASE-16321. Found Banned Dependency: com.google.code.findbugs:jsr305:jar:1.3.9 Use 'mvn dependency:tree' to locate the source of the banned dependencies. {code} Here is related snippet from hbase-spark/pom.xml: {code} com.google.code.findbugs jsr305 {code} Dependency on jsr305 should be dropped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20244) NoSuchMethodException when retrieving private method decryptEncryptedDataEncryptionKey from DFSClient
Ted Yu created HBASE-20244: -- Summary: NoSuchMethodException when retrieving private method decryptEncryptedDataEncryptionKey from DFSClient Key: HBASE-20244 URL: https://issues.apache.org/jira/browse/HBASE-20244 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu I was running unit test against hadoop 3.0.1 RC and saw the following in test output: {code} ERROR [RS-EventLoopGroup-3-3] asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper(267): Couldn't properly initialize access to HDFS internals. Please update your WAL Provider to not make use of the 'asyncfs' provider. See HBASE-16110 for more information. java.lang.NoSuchMethodException: org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(org.apache.hadoop.fs.FileEncryptionInfo) at java.lang.Class.getDeclaredMethod(Class.java:2130) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.createTransparentCryptoHelper(FanOutOneBlockAsyncDFSOutputSaslHelper.java:232) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputSaslHelper.(FanOutOneBlockAsyncDFSOutputSaslHelper.java:262) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.initialize(FanOutOneBlockAsyncDFSOutputHelper.java:661) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.access$300(FanOutOneBlockAsyncDFSOutputHelper.java:118) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:720) at org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$13.operationComplete(FanOutOneBlockAsyncDFSOutputHelper.java:715) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:306) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:341) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) {code} The private method was moved by HDFS-12574 to HdfsKMSUtil with different signature. To accommodate the above method movement, it seems we need to call the following method of DFSClient : {code} public KeyProvider getKeyProvider() throws IOException { {code} Since the new decryptEncryptedDataEncryptionKey method has this signature: {code} static KeyVersion decryptEncryptedDataEncryptionKey(FileEncryptionInfo feInfo, KeyProvider keyProvider) throws IOException { {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20214) Review of RegionLocationFinder Class
[ https://issues.apache.org/jira/browse/HBASE-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-20214: > Review of RegionLocationFinder Class > > > Key: HBASE-20214 > URL: https://issues.apache.org/jira/browse/HBASE-20214 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Affects Versions: 2.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-20214.1.patch > > > # Use SLF4J parameter logging > # Remove superfluous code > # Replace code with re-usable libraries where possible > # Use different data structure > # Small perf improvements > # Fix some checkstyle -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20196) Maintain all regions with same size in memstore flusher
Ted Yu created HBASE-20196: -- Summary: Maintain all regions with same size in memstore flusher Key: HBASE-20196 URL: https://issues.apache.org/jira/browse/HBASE-20196 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Here is the javadoc for getCopyOfOnlineRegionsSortedByOffHeapSize() : {code} * the biggest. If two regions are the same size, then the last one found wins; i.e. this * method may NOT return all regions. {code} Currently value type is HRegion - we only store one region per size. I think we should change value type to Collection so that we don't miss any region (potentially with big size). e.g. Suppose there are there regions (R1, R2 and R3) with sizes 100, 100 and 1, respectively. Using the current data structure, R2 would be stored in the Map, evicting R1 from the Map. This means that the current code would choose to flush regions R2 and R3, releasing 101 from memory. If value type is changed to Collection, we would flush both R1 and R2. This achieves faster memory reclamation. Confirmed with [~eshcar] over in HBASE-20090 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-20104) Fix infinite loop of RIT when creating table on a rsgroup that has no online servers
[ https://issues.apache.org/jira/browse/HBASE-20104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-20104. Resolution: Fixed Fix Version/s: (was: 1.4.3) Reverted from branch-1 and branch-1.4 Xiaolin: If you want to backport the patch, please open another JIRA. This was marked fixed for beta2 which has shipped. > Fix infinite loop of RIT when creating table on a rsgroup that has no online > servers > > > Key: HBASE-20104 > URL: https://issues.apache.org/jira/browse/HBASE-20104 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.0.0-beta-2 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-20104.branch-1.001.patch, > HBASE-20104.branch-1.4.001.patch, HBASE-20104.branch-2.001.patch, > HBASE-20104.branch-2.002.patch > > > This error has been reported in > https://builds.apache.org/job/PreCommit-HBASE-Build/11635/testReport/org.apache.hadoop.hbase.rsgroup/TestRSGroups/org_apache_hadoop_hbase_rsgroup_TestRSGroups/ > Cases that creating tables on a rsgroup which has been stopped or > decommissioned all region servers can reproduce this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (HBASE-20104) Fix infinite loop of RIT when creating table on a rsgroup that has no online servers
[ https://issues.apache.org/jira/browse/HBASE-20104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-20104: > Fix infinite loop of RIT when creating table on a rsgroup that has no online > servers > > > Key: HBASE-20104 > URL: https://issues.apache.org/jira/browse/HBASE-20104 > Project: HBase > Issue Type: Bug > Components: rsgroup >Affects Versions: 2.0.0-beta-2 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > Fix For: 2.0.0-beta-2, 1.4.3 > > Attachments: HBASE-20104.branch-1.001.patch, > HBASE-20104.branch-1.4.001.patch, HBASE-20104.branch-2.001.patch, > HBASE-20104.branch-2.002.patch > > > This error has been reported in > https://builds.apache.org/job/PreCommit-HBASE-Build/11635/testReport/org.apache.hadoop.hbase.rsgroup/TestRSGroups/org_apache_hadoop_hbase_rsgroup_TestRSGroups/ > Cases that creating tables on a rsgroup which has been stopped or > decommissioned all region servers can reproduce this error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20176) Fix warnings about Logging import in hbase-spark test code
Ted Yu created HBASE-20176: -- Summary: Fix warnings about Logging import in hbase-spark test code Key: HBASE-20176 URL: https://issues.apache.org/jira/browse/HBASE-20176 Project: HBase Issue Type: Test Reporter: Ted Yu This is follow-on to HBASE-16179. In HBASE-16179 we fixed warning in non-test code in the following form: {code} warning: imported `Logging' is permanently hidden by definition of trait Logging in package spark {code} However, there are a few warnings not detected by precommit bot: {code} [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseConnectionCacheSuite.scala:25: warning: imported `Logging' is permanently hidden by definition oftrait Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala:23: warning: imported `Logging' is permanently hidden by definition of object Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala:23: warning: imported `Logging' is permanently hidden by definition of trait Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctionsSuite.scala:20: warning: imported `Logging' is permanently hidden by definition of object Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctionsSuite.scala:20: warning: imported `Logging' is permanently hidden by definition of trait Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctionsSuite.scala:20: warning: imported `Logging' is permanently hidden by definition of object Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctionsSuite.scala:20: warning: imported `Logging' is permanently hidden by definition of trait Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/PartitionFilterSuite.scala:21: warning: imported `Logging' is permanently hidden by definition of object Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging [WARNING] ^ [WARNING] /a/hbase/hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/PartitionFilterSuite.scala:21: warning: imported `Logging' is permanently hidden by definition of trait Logging in package spark [WARNING] import org.apache.hadoop.hbase.spark.Logging {code} This issue is to fix the above warnings in test code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20136) TestKeyValue misses ClassRule and Category annotations
Ted Yu created HBASE-20136: -- Summary: TestKeyValue misses ClassRule and Category annotations Key: HBASE-20136 URL: https://issues.apache.org/jira/browse/HBASE-20136 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu hbase-common/src/test/java/org/apache/hadoop/hbase/TestKeyValue.java misses ClassRule and Category annotations. This issue adds the annotations to this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20123) Backup test fails against hadoop 3
Ted Yu created HBASE-20123: -- Summary: Backup test fails against hadoop 3 Key: HBASE-20123 URL: https://issues.apache.org/jira/browse/HBASE-20123 Project: HBase Issue Type: Bug Reporter: Ted Yu When running backup unit test against hadoop3, I saw: {code} [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 88.862 s <<< FAILURE! - in org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes [ERROR] testBackupMultipleDeletes(org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes) Time elapsed: 86.206 s <<< ERROR! java.io.IOException: java.io.IOException: Failed copy from hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to hdfs://localhost:40578/backupUT at org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82) Caused by: java.io.IOException: Failed copy from hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to hdfs://localhost:40578/backupUT at org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82) {code} In the test output, I found: {code} 2018-03-03 14:46:10,858 ERROR [Time-limited test] mapreduce.MapReduceBackupCopyJob$BackupDistCp(237): java.io.IOException: Path hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic link java.io.IOException: Path hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic link at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:338) at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:461) at org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:155) at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308) at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:163) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) at org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.createInputFileListing(MapReduceBackupCopyJob.java:297) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.execute(MapReduceBackupCopyJob.java:196) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:408) at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:348) at org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:290) at org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605) {code} It seems the failure was related to how we use distcp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20121) Fix findbugs warning for RestoreTablesClient
Ted Yu created HBASE-20121: -- Summary: Fix findbugs warning for RestoreTablesClient Key: HBASE-20121 URL: https://issues.apache.org/jira/browse/HBASE-20121 Project: HBase Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ted Yu In RestoreTablesClient#restore(), the following variable is not used: {code} Set backupIdSet = new HashSet<>(); {code} There is backupIdSet#add() call later in the method but the variable doesn't appear in any other part of the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)