[jira] [Created] (HBASE-28055) Performance improvement for scan over several stores.
Sergey Soldatov created HBASE-28055: --- Summary: Performance improvement for scan over several stores. Key: HBASE-28055 URL: https://issues.apache.org/jira/browse/HBASE-28055 Project: HBase Issue Type: Bug Affects Versions: 2.5.5, 3.0.0-alpha-4 Reporter: Sergey Soldatov Assignee: Sergey Soldatov During the fix of HBASE-19863, an additional check for fake cells that trigger reseek was added. It comes that this check produces unnecessary reseeks because matcher.compareKeyForNextColumn should be used only with indexed keys. Later [~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a better solution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27630) hbase-spark bulkload stage directory limited to hdfs only
Sergey Soldatov created HBASE-27630: --- Summary: hbase-spark bulkload stage directory limited to hdfs only Key: HBASE-27630 URL: https://issues.apache.org/jira/browse/HBASE-27630 Project: HBase Issue Type: Bug Components: spark Affects Versions: 3.0.0-alpha-3 Reporter: Sergey Soldatov Assignee: Sergey Soldatov It's impossible to set up the staging directory for bulkload operation in spark-hbase connector to any other filesystem different from hdfs. That might be a problem for deployments where hbase.rootdir points to cloud storage. In this case, an additional copy task from hdfs to cloud storage would be required before loading hfiles to hbase. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-20719) HTable.batch() doesn't handle TableNotFound correctly.
[ https://issues.apache.org/jira/browse/HBASE-20719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov resolved HBASE-20719. - Resolution: Duplicate > HTable.batch() doesn't handle TableNotFound correctly. > -- > > Key: HBASE-20719 > URL: https://issues.apache.org/jira/browse/HBASE-20719 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Minor > > batch() as well as delete() are processing using AsyncRequest. To report > about problems we are using RetriesExhaustedWithDetailsException and there is > no special handling for TableNotFound exception. So, the final result for > running batch or delete operations against not existing table looks really > weird and missleading: > {noformat} > hbase(main):003:0> delete 't1', 'r1', 'c1' > 2018-06-12 15:02:50,742 ERROR [main] client.AsyncRequestFutureImpl: Cannot > get replica 0 location for > {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807} > ERROR: Failed 1 action: t1: 1 time, servers with issues: null > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-20926) IntegrationTestRSGroup is broken
[ https://issues.apache.org/jira/browse/HBASE-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov resolved HBASE-20926. - Resolution: Invalid The class was refactored to a number of smaller tests > IntegrationTestRSGroup is broken > - > > Key: HBASE-20926 > URL: https://issues.apache.org/jira/browse/HBASE-20926 > Project: HBase > Issue Type: Bug > Components: integration tests >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0-alpha-4 > > > There are several problems: > 1. It doesn't work in minicluster mode because in afterMethod() it using > IntegrationTestingUtility.restoreCluster() which just shutdown minicluster in > the not distributed mode > 2. It uses tests from TestRSGroups which was supposed to be common for both > unit and integration tests, but for last two years, there were a number of > tests added that are using internal API, not compatible with the distributed > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27397) Spark-hbase support for 'startWith' predicate.
Sergey Soldatov created HBASE-27397: --- Summary: Spark-hbase support for 'startWith' predicate. Key: HBASE-27397 URL: https://issues.apache.org/jira/browse/HBASE-27397 Project: HBase Issue Type: New Feature Components: hbase-connectors Affects Versions: 3.0.0-alpha-3 Reporter: Sergey Soldatov Currently, spark-hbase connector doesn't support stringStartWith predicate and completely ignores it. This is a disadvantage compared to the Apache Phoenix connector and the old SHC (Hortonworks spark-hbase connector). It would be nice if we also have this functionality in the actual connector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27061) two phase bulkload is broken when SFT is in use.
Sergey Soldatov created HBASE-27061: --- Summary: two phase bulkload is broken when SFT is in use. Key: HBASE-27061 URL: https://issues.apache.org/jira/browse/HBASE-27061 Project: HBase Issue Type: Bug Affects Versions: 2.4.12 Reporter: Sergey Soldatov In HBASE-26707 for the SFT case, we are writing files directly to the region location. For that we are using HRegion.regionDir as the staging directory. The problem is that in reality, this dir is pointing to the WAL dir, so for S3 deployments that would be pointing to the hdfs. As the result during the execution of LoadIncrementalHFiles the process failed with the exception: {noformat} 2022-05-24 03:31:23,656 ERROR org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to complete bulk load java.lang.IllegalArgumentException: Wrong FS hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da -expected s3a://hbase at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224) at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647) at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337) at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397) at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266) at org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-27053) IOException during caching of uncompressed block to the block cache.
Sergey Soldatov created HBASE-27053: --- Summary: IOException during caching of uncompressed block to the block cache. Key: HBASE-27053 URL: https://issues.apache.org/jira/browse/HBASE-27053 Project: HBase Issue Type: Bug Components: BlockCache Affects Versions: 2.4.12 Reporter: Sergey Soldatov Assignee: Sergey Soldatov When prefetch to block cache is enabled and blocks are compressed sometimes caching fails with the exception: {noformat} 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] regionserver.HRegion: Could not initialize all stores for the region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7. 2022-05-18 21:37:29,598 WARN [RS_OPEN_REGION-regionserver/x1:16020-2] regionserver.HRegion: Failed initialize of region= cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., starting to roll back memstore java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149) at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090) at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 at org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294) at org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:294) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6375) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 at org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:199) at org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:231) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:447) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:432) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:418) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:60) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1319) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1317) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:942) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:931) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:171) at org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:321) at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:477) at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:490) at
[jira] [Created] (HBASE-26972) Restored table from snapshot that has MOB is inconsistent
Sergey Soldatov created HBASE-26972: --- Summary: Restored table from snapshot that has MOB is inconsistent Key: HBASE-26972 URL: https://issues.apache.org/jira/browse/HBASE-26972 Project: HBase Issue Type: Bug Components: mob, snapshots Affects Versions: 3.0.0-alpha-2 Reporter: Sergey Soldatov Assignee: Sergey Soldatov When we restore the table from snapshot and it has MOB files, there are links that do not fit the pattern of HFileLink. I'm not sure to which side effects that might lead, but at least it's not possible to create a snapshot right after the restore: {quote} Version 3.0.0-alpha-3-SNAPSHOT, rcd45cadbc1a42db359ff4e775cbd4b55cfe28140, Fri Apr 22 03:04:25 PM PDT 2022 Took 0.0016 seconds hbase:001:0> list_snapshot list_snapshot_sizes list_snapshots hbase:001:0> list_snapshots SNAPSHOT TABLE + CREATION TIME t1 table_1 (2022-04-22 15:48:04 -0700) 1 row(s) Took 1.0881 seconds => ["t1"] hbase:002:0> restore_snapshot 't1' Took 2.3942 seconds hbase:003:0> snapshot snapshot snapshot_cleanup_enabled snapshot_cleanup_switch hbase:003:0> snapshot 'table_1', 't2' ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } had an error. Procedure t2 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:403) at org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1325) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:106) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:86) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } due to exception:Can't find hfile: table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a in the real (hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) or archive (hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) directory for the primary table.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't find hfile: table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a in the real (hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) or archive (hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) directory for the primary table. at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:322) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:392) ... 6 more Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't find hfile: table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
[jira] [Created] (HBASE-26767) Rest server should not use a large Header Cache.
Sergey Soldatov created HBASE-26767: --- Summary: Rest server should not use a large Header Cache. Key: HBASE-26767 URL: https://issues.apache.org/jira/browse/HBASE-26767 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.4.9 Reporter: Sergey Soldatov Assignee: Sergey Soldatov In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE (65536). That's not compatible with jetty-9.4.x because the cache size is limited by Character.MAX_VALUE - 1 (65534) there. According to the Jetty source code comments, it's possible to have a buffer overflow in the cache for higher values and that might lead to wrong/incomplete values returned by cache and following incorrect header handling. There are a couple of ways to fix it: 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534 2. make header cache size configurable and set its size separately from the header size. I believe that the second would give us more flexibility. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26463) Unreadable table names after HBASE-24605
Sergey Soldatov created HBASE-26463: --- Summary: Unreadable table names after HBASE-24605 Key: HBASE-26463 URL: https://issues.apache.org/jira/browse/HBASE-26463 Project: HBase Issue Type: Bug Components: UI Affects Versions: 3.0.0-alpha-1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Attachments: After.png, Before.png During the fix HBASE-24605 (Break long region names in the web UI) the 'word break' rule was applied to the 'table-striped' style, so literally, all tables in Master Web UI were affected. The most noticeable is the 'User Tables' when table names, as well as descriptions, and even State are wrapped in a very ugly way. We have two options here: 1. Fix the user tables only as it was done previously for the procedures table 2. Apply 'word break' to those tables that require that. Since most of the users are comfortable with the current UI we may go through the first way, so the changes would affect only the User table. Sample screenshots before and after are attached. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
Sergey Soldatov created HBASE-20927: --- Summary: RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet. Key: HBASE-20927 URL: https://issues.apache.org/jira/browse/HBASE-20927 Project: HBase Issue Type: Bug Affects Versions: 2.1.0 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.1.1 Admin.clearDeadServers is supposed to return the list of servers that were not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is thrown: {noformat} Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty. at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) {noformat} That happens because in postClearDeadServers it calls groupAdminServer.removeServers(clearedServer) even if the clearedServer is empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20926) IntegrationTestRSGroup is broken
Sergey Soldatov created HBASE-20926: --- Summary: IntegrationTestRSGroup is broken Key: HBASE-20926 URL: https://issues.apache.org/jira/browse/HBASE-20926 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 2.1.0 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.1.1 There are several problems: 1. It doesn't work in minicluster mode because in afterMethod() it using IntegrationTestingUtility.restoreCluster() which just shutdown minicluster in the not distributed mode 2. It uses tests from TestRSGroups which was supposed to be common for both unit and integration tests, but for last two years, there were a number of tests added that are using internal API, not compatible with the distributed mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20852) inconsistent version report in Master UI after HBASE-20722
Sergey Soldatov created HBASE-20852: --- Summary: inconsistent version report in Master UI after HBASE-20722 Key: HBASE-20852 URL: https://issues.apache.org/jira/browse/HBASE-20852 Project: HBase Issue Type: Bug Affects Versions: 3.0.0, 2.1.0 Reporter: Sergey Soldatov Master web UI is able to report if the versions of RS and master are different. Previously the check was performed between the master version that we get from Version class ( generated during the build) and RS version from RegionServerTracker (which was using the same Version class). In HBASE-20722 this behavior has been changed, so now for RS we get numeric version and convert it to the string representation. That works only for numeric versions like 2.0.0 / 2.1.0, but it doesn't work for -SNAPSHOT or any other custom versions and Master UI reports that all region servers have an inconsistent version. [~Apache9], [~stack], [~tedyu] FYI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20719) HTable.batch() doesn't handle TableNotFound correctly.
Sergey Soldatov created HBASE-20719: --- Summary: HTable.batch() doesn't handle TableNotFound correctly. Key: HBASE-20719 URL: https://issues.apache.org/jira/browse/HBASE-20719 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.1.0 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.1.0 batch() as well as delete() are processing using AsyncRequest. To report about problems we are using RetriesExhaustedWithDetailsException and there is no special handling for TableNotFound exception. So, the final result for running batch or delete operations against not existing table looks really weird and missleading: {noformat} hbase(main):003:0> delete 't1', 'r1', 'c1' 2018-06-12 15:02:50,742 ERROR [main] client.AsyncRequestFutureImpl: Cannot get replica 0 location for {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807} ERROR: Failed 1 action: t1: 1 time, servers with issues: null {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may stuck
Sergey Soldatov created HBASE-20657: --- Summary: Retrying RPC call for ModifyTableProcedure may stuck Key: HBASE-20657 URL: https://issues.apache.org/jira/browse/HBASE-20657 Project: HBase Issue Type: Bug Components: Client, proc-v2 Affects Versions: 2.0.0 Reporter: Sergey Soldatov Env: 2 masters, 1 RS. Steps to reproduce: Active master is killed while ModifyTableProcedure is executed. If the table has enough regions it may come that when the secondary master get active some of the regions may be closed, so once client retries the call to the new active master, a new ModifyTableProcedure is created and get stuck during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: 1. When we are retrying from client side, we call modifyTableAsync which create a procedure with a new nonce key: {noformat} ModifyTableRequest request = RequestConverter.buildModifyTableRequest( td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); {noformat} So on the server side, it's considered as a new procedure and starts executing immediately. 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create MoveRegionProcedure for each region, but it checks whether the region is online (and it's not), so it fails immediately, forcing the procedure to restart. [~an...@apache.org] saw a similar case when two concurrent ModifyTable procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20621) Unclear error for deleting from not existing table.
Sergey Soldatov created HBASE-20621: --- Summary: Unclear error for deleting from not existing table. Key: HBASE-20621 URL: https://issues.apache.org/jira/browse/HBASE-20621 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Sergey Soldatov When I try to delete a row from a not existing table, the error is quite confusing. Instead of getting a table not found exception I got {noformat} ERROR [main] client.AsyncRequestFutureImpl: Cannot get replica 0 location for {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807} ERROR: Failed 1 action: t1: 1 time, servers with issues: null {noformat} That happens, because delete is using AsyncRequestFuture which wraps all region location errors into 'Cannot get replica' error. I expect that others actions like batch, mutateRow, checkAndDelete behave in the same way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
Sergey Soldatov created HBASE-19863: --- Summary: java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used Key: HBASE-19863 URL: https://issues.apache.org/jira/browse/HBASE-19863 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 1.4.1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Under some circumstances scan with SingleColumnValueFilter may fail with an exception {noformat} java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, qualifier=C2, timestamp=1516433595543, comparison result: 1 at org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) {noformat} Conditions: table T with a single column family 0 that uses ROWCOL bloom filter (important) and column qualifiers C1,C2,C3,C4,C5. When we fill the table for every row we put deleted cell for C3. The table has a single region with two HStore: A: start row: 0, stop row: 99 B: start row: 10 stop row: 99 B has newer versions of rows 10-99. Store files have several blocks each (important). Store A is the result of major compaction, so it doesn't have any deleted cells (important). So, we are running a scan like: {noformat} scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter ('0','C5',=,'binary:whatever')"} {noformat} How the scan performs: First, we iterate A for rows 0 and 1 without any problems. Next, we start to iterate A for row 10, so read the first cell and set hfs scanner to A : 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 10:0/C1/1/Put/x, so we make B as our current store scanner. Since we are looking for particular columns C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn which would run reseek for all store scanners. For store A the following magic would happen in requestSeek: 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to false because row 10 doesn't have C3 qualifier in store A. 2. Since we don't have to seek we just create a fake row 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for us and it commented with : {noformat} // Multi-column Bloom filter optimization. // Create a fake key/value, so that this scanner only bubbles up to the top // of the KeyValueHeap in StoreScanner after we scanned this row/column in // all other store files. The query matcher will then just skip this fake // key/value and the store scanner will progress to the next column. This // is obviously not a "real real" seek, but unlike the fake KV earlier in // this method, we want this to be propagated to ScanQueryMatcher. {noformat} For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum to skip C3 entirely. After that we start searching for qualifier C5 using seekOrSkipToNextColumn which run first trySkipToNextColumn: {noformat} protected boolean trySkipToNextColumn(Cell cell) throws IOException { Cell nextCell = null; do { Cell nextIndexedKey = getNextIndexedKey(); if (nextIndexedKey != null && nextIndexedKey != KeyValueScanner.NO_NEXT_INDEXED_KEY && matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0) { this.heap.next(); ++kvsScanned; } else { return false; } } while ((nextCell = this.heap.peek()) != null && CellUtil.matchingRowColumn(cell, nextCell)); return true; } {noformat} If store has several blocks than nextIndexedKey would be not null and compareKeyForNextColumn wouldn't be negative, so we try to search forward until we index or end of the row. But in this.heap.next(), the scanner for A bubbles
[jira] [Created] (HBASE-19775) hbase shell doesn't handle the exceptions that are wrapped in java.io.UncheckedIOException
Sergey Soldatov created HBASE-19775: --- Summary: hbase shell doesn't handle the exceptions that are wrapped in java.io.UncheckedIOException Key: HBASE-19775 URL: https://issues.apache.org/jira/browse/HBASE-19775 Project: HBase Issue Type: Bug Components: shell Affects Versions: 2.0.0-beta-1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.0.0-beta-1 HBase shell doesn't have a notion of UncheckedIOException, so it may not handle it correctly. For an example, if we scan not existing table the error look weird: {noformat} hbase(main):001:0> scan 'a' ROW COLUMN+CELL ERROR: a {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19774) incorrect behavior of locateRegionInMeta
Sergey Soldatov created HBASE-19774: --- Summary: incorrect behavior of locateRegionInMeta Key: HBASE-19774 URL: https://issues.apache.org/jira/browse/HBASE-19774 Project: HBase Issue Type: Bug Affects Versions: 2.0.0-beta-1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.0.0-beta-1 When we try to operate with not existing table, in some circumstances we get an incorrect report about the not existing table: {noformat} ERROR: Region of 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is expected in the table of 'nonExistentUsertable', but hbase:meta says it is in the table of 'hbase:namespace'. hbase:meta might be damaged. {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19717) IntegrationTestDDLMasterFailover is using outdated values for DataBlockEncoding
Sergey Soldatov created HBASE-19717: --- Summary: IntegrationTestDDLMasterFailover is using outdated values for DataBlockEncoding Key: HBASE-19717 URL: https://issues.apache.org/jira/browse/HBASE-19717 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 2.0.0-beta-1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.0.0-beta-1 We have removed PREFIX_TREE data block encoding, but IntegrationTestDDLMasterFailover is still using it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19705) Remove AMv2 dependency on advancing clock
Sergey Soldatov created HBASE-19705: --- Summary: Remove AMv2 dependency on advancing clock Key: HBASE-19705 URL: https://issues.apache.org/jira/browse/HBASE-19705 Project: HBase Issue Type: Bug Components: amv2 Affects Versions: 2.0.0-beta-1 Reporter: Sergey Soldatov As per discussion on dev list it would be nice to remove the dependency of AMv2 on advancing clock. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19393) HTTP 413 FULL head while accessing HBase UI using SSL.
Sergey Soldatov created HBASE-19393: --- Summary: HTTP 413 FULL head while accessing HBase UI using SSL. Key: HBASE-19393 URL: https://issues.apache.org/jira/browse/HBASE-19393 Project: HBase Issue Type: Bug Components: UI Affects Versions: 1.4.0 Environment: SSL enabled for UI/REST. Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 1.4.0 For REST/UI we are using 64Kb header buffer size instead of the jetty default 6kb (?). But it comes that we set it only for _http_ protocol, but not for _https_. So if SSL is enabled it's quite easy to get HTTP 413 error. Not relevant to branch-2 nor master because it's fixed by HBASE-12894 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19304) KEEP_DELETED_CELLS should ignore case
Sergey Soldatov created HBASE-19304: --- Summary: KEEP_DELETED_CELLS should ignore case Key: HBASE-19304 URL: https://issues.apache.org/jira/browse/HBASE-19304 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.0.0-alpha-4 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Since HBASE-12363 we start using an enum instead of boolean for keep_deleted_cells. In ColumnFamilyDescriptorBuilder we are using valueOf to find out the value of the property. But there is a problem: all values in ENUM are uppercase, so if we provide the value in lowercase (and java Boolean returns it in lowercase in toString), the table creation may fail with an exception: {code} java.io.IOException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hbase.KeepDeletedCells.true at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1028) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:891) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:859) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6966) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6923) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6894) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6850) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6801) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:285) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:110) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hbase.KeepDeletedCells.true at java.lang.Enum.valueOf(Enum.java:238) at org.apache.hadoop.hbase.KeepDeletedCells.valueOf(KeepDeletedCells.java:30) at org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.lambda$getStringOrDefault$23(ColumnFamilyDescriptorBuilder.java:719) at org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.getOrDefault(ColumnFamilyDescriptorBuilder.java:727) at org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.getStringOrDefault(ColumnFamilyDescriptorBuilder.java:719) at org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder$ModifyableColumnFamilyDescriptor.getKeepDeletedCells(ColumnFamilyDescriptorBuilder.java:901) at org.apache.hadoop.hbase.regionserver.ScanInfo.(ScanInfo.java:69) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:265) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5485) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:992) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:989) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-15884) NPE in StoreFileScanner during reverse scan
Sergey Soldatov created HBASE-15884: --- Summary: NPE in StoreFileScanner during reverse scan Key: HBASE-15884 URL: https://issues.apache.org/jira/browse/HBASE-15884 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 2.0.0 Reporter: Sergey Soldatov Here is a part of {{skipKVsNewerThanReadpoint}} method: {noformat} hfs.next(); setCurrentCell(hfs.getKeyValue()); if (this.stopSkippingKVsIfNextRow && getComparator().compareRows(cur.getRowArray(), cur.getRowOffset(), cur.getRowLength(), startKV.getRowArray(), startKV.getRowOffset(), startKV.getRowLength()) > 0) { {noformat} If hfs has no more KVs, cur will be set to Null and on on the next step will throw NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11829) TestHCM.testClusterStatus fails with timeout
Sergey Soldatov created HBASE-11829: --- Summary: TestHCM.testClusterStatus fails with timeout Key: HBASE-11829 URL: https://issues.apache.org/jira/browse/HBASE-11829 Project: HBase Issue Type: Bug Components: test Environment: Ubuntu 14.04 64bit java version 1.7.0_65 Java(TM) SE Runtime Environment (build 1.7.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) Reporter: Sergey Soldatov Priority: Minor Test fails with an exception: java.lang.Exception: Unexpected exception, expectedorg.apache.hadoop.hbase.regionserver.RegionServerStoppedException but wasjunit.framework.AssertionFailedError at org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:28) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) Note: the failure usually reproduces on the machines where openvpn is installed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11830) TestReplicationThrottler.testThrottling failed on virtual boxes
Sergey Soldatov created HBASE-11830: --- Summary: TestReplicationThrottler.testThrottling failed on virtual boxes Key: HBASE-11830 URL: https://issues.apache.org/jira/browse/HBASE-11830 Project: HBase Issue Type: Bug Components: test Environment: kvm with Centos 6.5, openjdk1.7 Reporter: Sergey Soldatov Priority: Minor during test runs TestReplicationThrottler.testThrottling sometimes fails with assertion testThrottling(org.apache.hadoop.hbase.replication.regionserver.TestReplicationThrottler) Time elapsed: 0.229 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hbase.replication.regionserver.TestReplicationThrottler.testThrottling(TestReplicationThrottler.java:69) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11770) TestBlockCacheReporting.testBucketCache is not stable
Sergey Soldatov created HBASE-11770: --- Summary: TestBlockCacheReporting.testBucketCache is not stable Key: HBASE-11770 URL: https://issues.apache.org/jira/browse/HBASE-11770 Project: HBase Issue Type: Bug Components: test Environment: kvm box with Ubuntu 12.04 Desktop 64bit. java version 1.7.0_65 Java(TM) SE Runtime Environment (build 1.7.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode) Reporter: Sergey Soldatov Assignee: Sergey Soldatov Depending on the machine and OS TestBlockCacheReporting.testBucketCache may fail with NPE: java.lang.NullPointerException at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BucketCache.java:417) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.getBlock(CombinedBlockCache.java:80) at org.apache.hadoop.hbase.io.hfile.TestBlockCacheReporting.addDataAndHits(TestBlockCacheReporting.java:67) at org.apache.hadoop.hbase.io.hfile.TestBlockCacheReporting.testBucketCache(TestBlockCacheReporting.java:86) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11069) Decouple region merging from ZooKeeper
Sergey Soldatov created HBASE-11069: --- Summary: Decouple region merging from ZooKeeper Key: HBASE-11069 URL: https://issues.apache.org/jira/browse/HBASE-11069 Project: HBase Issue Type: Sub-task Components: Consensus, Zookeeper Reporter: Sergey Soldatov As part of HBASE-10296 Region Merge should be decoupled from Zookeeper. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10985) Decouple Split Transaction from Zookeeper
Sergey Soldatov created HBASE-10985: --- Summary: Decouple Split Transaction from Zookeeper Key: HBASE-10985 URL: https://issues.apache.org/jira/browse/HBASE-10985 Project: HBase Issue Type: Sub-task Components: regionserver, Zookeeper Reporter: Sergey Soldatov As part of HBASE-10296 SplitTransaction should be decoupled from Zookeeper. This is an initial patch for review. At the moment the consensus provider placed directly to SplitTransaction to minimize affected code. In the ideal world it should be done in HServer. -- This message was sent by Atlassian JIRA (v6.2#6252)