[jira] [Updated] (HBASE-28488) Avoid expensive allocation in createRegionSpan
[ https://issues.apache.org/jira/browse/HBASE-28488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28488: --- Labels: pull-request-available (was: ) > Avoid expensive allocation in createRegionSpan > -- > > Key: HBASE-28488 > URL: https://issues.apache.org/jira/browse/HBASE-28488 > Project: HBase > Issue Type: Improvement > Components: tracing >Affects Versions: 2.5.0 > Environment: Multiple clusters with: > * OpenJDK 11.0.22+7 > * HBase 2.5.7 > * 90-95% writes requests >Reporter: Thibault Deutsch >Priority: Minor > Labels: pull-request-available > Attachments: > 0001-HBASE-28488-Use-encoded-name-in-region-span-attribut.patch, Screenshot > 2024-04-05 at 00.27.11.png > > > On our busy clusters, the alloc profile shows that createRegionSpan() is > responsible for 15-20% of all the allocations. These allocations comes from > getRegionNameAsString(). > getRegionNameAsString() takes the region name and encode invisible characters > in their hex representation. This requires the use of a StringBuilder and > thus generate new strings every time. > This becomes really expensive on a cluster with high number of requests. We > have a patch that replaced the call with getEncodedName() instead. It seems > better to just take the encoded region name (the md5 part) and use that in > trace attributes, because: > - it's fixed in size (the full region name can be much longer depending on > the rowkey size), > - it's enough information to link a trace to a region, > - it doesn't require any new allocation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28974) Improve the log message with parameterized logging in LRUBlockCache
[ https://issues.apache.org/jira/browse/HBASE-28974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28974: --- Labels: pull-request-available (was: ) > Improve the log message with parameterized logging in LRUBlockCache > - > > Key: HBASE-28974 > URL: https://issues.apache.org/jira/browse/HBASE-28974 > Project: HBase > Issue Type: Improvement > Components: logging >Reporter: Chandra Sekhar K >Assignee: Chandra Sekhar K >Priority: Minor > Labels: pull-request-available > > Improve the log message with parameterized logging in LRUBlockCache.java and > LRUAdaptiveBlockCache.java -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-25357) allow specifying binary row key range to pre-split regions
[ https://issues.apache.org/jira/browse/HBASE-25357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-25357: --- Labels: pull-request-available (was: ) > allow specifying binary row key range to pre-split regions > -- > > Key: HBASE-25357 > URL: https://issues.apache.org/jira/browse/HBASE-25357 > Project: HBase > Issue Type: Improvement > Components: spark >Reporter: Yubao Liu >Priority: Major > Labels: pull-request-available > > Currently, spark hbase connector use `String` to specify regionStart and > regionEnd, but we often have serialized binary row key, I made a little > patch at [https://github.com/apache/hbase-connectors/pull/72/files] to always > treat the `String` in ISO_8859_1, so we can put raw bytes into the String > object and get it unchanged. > This has a drawback, if your row key is really Unicode strings beyond > ISO_8859_1 charset, you should convert it to UTF-8 encoded bytes and then > encapsulate it in ISO_8859_1 string. This is a limitation of Spark option > interface which allows only string to string map. > {code:java} > import java.nio.charset.StandardCharsets; > df.write() > .format("org.apache.hadoop.hbase.spark") > .option(HBaseTableCatalog.tableCatalog(), catalog) > .option(HBaseTableCatalog.newTable(), 5) > .option(HBaseTableCatalog.regionStart(), new > String("你好".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1)) > .option(HBaseTableCatalog.regionEnd(), new > String("世界".getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1)) > .mode(SaveMode.Append) > .save(); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28975) Update dev-support/hbase_docker Dockerfiles to use JDK17
[ https://issues.apache.org/jira/browse/HBASE-28975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28975: --- Labels: pull-request-available (was: ) > Update dev-support/hbase_docker Dockerfiles to use JDK17 > > > Key: HBASE-28975 > URL: https://issues.apache.org/jira/browse/HBASE-28975 > Project: HBase > Issue Type: Bug >Affects Versions: 4.0.0-alpha-1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > > HBase master branch requires JDK17, but the docker files under > dev-support/hbase_docker install only JDK8. Time to update to JDK17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28976) Fix UT flakeyness in TestBucketCache.testBlockAdditionWaitWhenCache and TestVerifyBucketCacheFile.testRetrieveFromFile
[ https://issues.apache.org/jira/browse/HBASE-28976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28976: --- Labels: pull-request-available (was: ) > Fix UT flakeyness in TestBucketCache.testBlockAdditionWaitWhenCache and > TestVerifyBucketCacheFile.testRetrieveFromFile > -- > > Key: HBASE-28976 > URL: https://issues.apache.org/jira/browse/HBASE-28976 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > Noticed these two tests failing intermittently on some of the pre-commit > runs. > 1) TestBucketCache.testBlockAdditionWaitWhenCache: The test calls > BucketCache.cacheBlock() passing true to the waitWhenCache parameter, > assuming the cache call will wait for a success cache, however this is only > true if the BucketCache.QUEUE_ADDITION_WAIT_TIME property is also defined, so > I believe this may be causing the intermittent failures when the pre-commit > runs on slower vms. > > 2) TestVerifyBucketCacheFile.testRetrieveFromFile: One of the checks > performed by this test is to delete the bucket cache file, shutdown the > current BucketCache instance, then create a new BucketCache instance that > would load the persistent file but should fail to recover the cache, as the > cache file has been deleted. The way this recover works, internally, is > async. We first read the contents of the persistent file, which includes the > last serialized backingMap and a checksum of the cache file at that time. It > then compares this recovered checksum against the current cache file > checksum. If this verification fails, it starts a background thread to > traverse all backingMap entries and check if those entries are still > available in the cache. At this point, the test is ready to check the > BucketCache allocator size, expecting it to be 0, but if the background > verification has not finished yet, this assert will fail. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28638) Impose retry limit for specific errors to recover from remote procedure failure using server crash
[ https://issues.apache.org/jira/browse/HBASE-28638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28638: --- Labels: pull-request-available (was: ) > Impose retry limit for specific errors to recover from remote procedure > failure using server crash > -- > > Key: HBASE-28638 > URL: https://issues.apache.org/jira/browse/HBASE-28638 > Project: HBase > Issue Type: Sub-task > Components: amv2, master, Region Assignment >Affects Versions: 3.0.0-beta-1, 2.6.1, 2.5.10 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As per one of the recent incidents, some regions faced 5+ minute of > availability drop because before active master could initiate SCP for the > dead server, some region moves tried to assign regions on the already dead > regionserver. Sometimes, due to transient issues, we see that active master > gets notified after few minutes (5+ minute in this case). > {code:java} > 2024-05-08 03:47:38,518 WARN [RSProcedureDispatcher-pool-4790] > procedure.RSProcedureDispatcher - request to host1,61020,1713411866443 failed > due to org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Call to > address=host1:61020 failed on local exception: > org.apache.hadoop.hbase.exceptions.ConnectionClosedException: Connection > closed, try=0, retrying... {code} > And as we know, we have infinite retries here, so it kept going on.. > > Eventually, SCP could be initiated only after active master discovered the > server as dead: > {code:java} > 2024-05-08 03:50:01,038 DEBUG [RegionServerTracker-0] master.DeadServer - > Processing host1,61020,1713411866443; numProcessing=1 > 2024-05-08 03:50:01,038 INFO [RegionServerTracker-0] > master.RegionServerTracker - RegionServer ephemeral node deleted, processing > expiration [host1,61020,1713411866443] {code} > leading to > {code:java} > 2024-05-08 03:50:02,313 DEBUG [RSProcedureDispatcher-pool-4833] > assignment.RegionRemoteProcedureBase - pid=54800701, ppid=54800691, > state=RUNNABLE; OpenRegionProcedure 5cafbe54d5685acc6c4866758e67fd51, > server=host1,61020,1713411866443 for region state=OPENING, > location=host1,61020,1713411866443, table=T1, > region=5cafbe54d5685acc6c4866758e67fd51, targetServer > host1,61020,1713411866443 is dead, SCP will interrupt us, give up {code} > This entire duration of outage could be avoided if we can fail-fast for > connection drop errors. > > *Problem Statement:* > Master initiated remote procedures are scheduled by RSProcedureDispatcher. If > it encounters specific errors on first retry (e.g. CallQueueTooBigException > or SaslException), it is guaranteed that the remote call has not reached the > regionserver, therefore the remote call is marked failed prompting the parent > procedure to select different target regionserver to resume the operation. > If the first attempt is successful, RSProcedureDispatcher continues with > infinite retries. We can encounter valid case (e.g. > ConnectionClosedException) which is halting the remote operation. Without > manual intervention, it can cause significant delay upto several minutes or > hours to the region-in-transition. > > *Proposed Solution:* > The purpose of this Jira is to impose retry limit for specific error types > such that if the retry limit is reached, the master can recover the state of > the ongoing remote call failure by initiating SCP (ServerCrashProcedure) on > the target server. The SCP is going to override the TRSP > (TransitRegionStateProcedure) if required. This can ensure that the target > server has no region hosted online before we suspend the ongoing TRSP. > Scheduling SCP for the target server will always lead to the regionserver in > stopped state. Either regionserver would be automatically stopped, or if the > regionserver is able to send the region report to master, master will reject > it, which will further lead to regionserver abort. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27638) Get slow/large log response that matched the ‘CLIENT_IP' without client port
[ https://issues.apache.org/jira/browse/HBASE-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27638: --- Labels: pull-request-available (was: ) > Get slow/large log response that matched the ‘CLIENT_IP' without client port > > > Key: HBASE-27638 > URL: https://issues.apache.org/jira/browse/HBASE-27638 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.4.14 >Reporter: mokai >Assignee: mokai >Priority: Major > Labels: pull-request-available > > 'get_largelog_responses' and 'get_slowlog_responses' support filter the > records with the given client by 'CLIENT_IP', but user has to provide the > client ip and client port both. The client uses an ephemeral port mostly, > there will be a better user experience if 'CLIENT_IP' without port is > supported, then all the records matched the client ip will be returned. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28806) ExportSnapshot failed if reference file presented
[ https://issues.apache.org/jira/browse/HBASE-28806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28806: --- Labels: pull-request-available (was: ) > ExportSnapshot failed if reference file presented > - > > Key: HBASE-28806 > URL: https://issues.apache.org/jira/browse/HBASE-28806 > Project: HBase > Issue Type: Bug > Components: snapshots >Affects Versions: 2.4.14 >Reporter: mokai >Assignee: mokai >Priority: Major > Labels: pull-request-available > > ExportSnapshot tasks failed due to FNFE. > If there are reference files in data.manifest, they are converted to referred > files, so same referred files are handled in multi map tasks. It will cause > FNFE if the same file copied in different map tasks concurrently. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28949) Remove the netty 3 dependency management section
[ https://issues.apache.org/jira/browse/HBASE-28949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28949: --- Labels: pull-request-available (was: ) > Remove the netty 3 dependency management section > > > Key: HBASE-28949 > URL: https://issues.apache.org/jira/browse/HBASE-28949 > Project: HBase > Issue Type: Task > Components: dependencies >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > > After HBASE-28942, netty will only be pulled in from hadoop 3.3.x, so we can > remove the dependency management section for it. > This will also remove the dependabot warnings about netty 3. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28969) Refactor direct interactions of HFilelink file creations to SFT interface
[ https://issues.apache.org/jira/browse/HBASE-28969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28969: --- Labels: pull-request-available (was: ) > Refactor direct interactions of HFilelink file creations to SFT interface > - > > Key: HBASE-28969 > URL: https://issues.apache.org/jira/browse/HBASE-28969 > Project: HBase > Issue Type: Improvement >Reporter: Prathyusha >Assignee: Prathyusha >Priority: Major > Labels: pull-request-available > > HBASE-27826 introduces tracking of Link/Reference files with StoreFileTracker > interface and with that, we can have the FileBasedStoreFileTracker > implementation to track Link/Reference files only using .filelist file itself > and not create actual link/ref files (virtual links). > To support the same, we need to refactor all the direct interactions of > Ref/Link file creations to SFT layer and this change > - adds a createHFilelink() api to StoreFileTrackerBase. > - Moves all the creations of HFileLink (except for Snapshot which will be > handled in HBASE-28863) to SFT layer -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28634) There is a possibility that data cannot be obtained during reverse fuzzy query.
[ https://issues.apache.org/jira/browse/HBASE-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28634: --- Labels: pull-request-available (was: ) > There is a possibility that data cannot be obtained during reverse fuzzy > query. > --- > > Key: HBASE-28634 > URL: https://issues.apache.org/jira/browse/HBASE-28634 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 2.4.14 >Reporter: qazwsx >Assignee: Dávid Paksy >Priority: Major > Labels: pull-request-available > Attachments: image-2024-06-03-14-26-57-645.png, > image-2024-06-03-14-27-17-516.png > > > this is my example: > 1. > create 'abcd','f' > put 'abcd','111311','f:name','a' > put 'abcd','111444','f:name','a' > put 'abcd','111511','f:name','a' > put 'abcd','111611','f:name','a' > put 'abcd','111446','f:name','a' > put 'abcd','111777 ','f:name','a' > put 'abcd',' 111777','f:name','a' > > 2. When I don't use the reversed query, I can get the data. > scan 'abcd', \{FILTER => > org.apache.hadoop.hbase.filter.FuzzyRowFilter.new(Arrays.asList(Pair.new(Bytes.toBytesBinary('111433'),Bytes.toBytesBinary("\xFF\xFF\xFF\xFF\x02\x02"} > !image-2024-06-03-14-26-57-645.png! > > 3. When I use the reversed query, I can not get the data. > scan 'abcd', \{REVERSED=>TRUE, FILTER => > org.apache.hadoop.hbase.filter.FuzzyRowFilter.new(Arrays.asList(Pair.new(Bytes.toBytesBinary('111433'),Bytes.toBytesBinary("\xFF\xFF\xFF\xFF\x02\x02"} > !image-2024-06-03-14-27-17-516.png! > > 4. > The test shows that the following two issues may be related to this issue: > HBASE-26232 > How to reproduce the issue resolved by this issue? > Currently, I solve the fuzzy query by rolling back the code of this issue. Is > there a better solution? > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28973) Support client access to HBase servers using IP in IPV6 env
[ https://issues.apache.org/jira/browse/HBASE-28973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28973: --- Labels: pull-request-available (was: ) > Support client access to HBase servers using IP in IPV6 env > --- > > Key: HBASE-28973 > URL: https://issues.apache.org/jira/browse/HBASE-28973 > Project: HBase > Issue Type: Bug > Components: master, regionserver >Reporter: Y. SREENIVASULU REDDY >Assignee: Y. SREENIVASULU REDDY >Priority: Major > Labels: pull-request-available > > Using IPV6 address failed to create the WAL path. > IPV6 address contains ":", eg: [0:0:0:0:0:0:0:1] > WAL path creation will fail, because FS will not support to create path with > ":". > so for use IP is true and the env is ipv6 > # Encode for wal path creation with IPV6 address > # Decode the IPV6 address, if RS servernames are prepared from the WAL path. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28972) Limit the number of retries in FanOutOneBlockAsyncDFSOutputHelper.completeFile
[ https://issues.apache.org/jira/browse/HBASE-28972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28972: --- Labels: pull-request-available (was: ) > Limit the number of retries in FanOutOneBlockAsyncDFSOutputHelper.completeFile > -- > > Key: HBASE-28972 > URL: https://issues.apache.org/jira/browse/HBASE-28972 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, io, wal >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > > After HBASE-28955, when shutting down MiniDFSCluster, we will close all the > output streams. > And in some WAL related tests, we want to keep the WAL file as open so we set > namenode to safe mode before shutting down, like > TestWALFactory.testAppendClose. > And in FanOutOneBlockAsyncDFSOutputHelper.completeFile, we will only give up > when hitting LeaseExpireException, so in this scenario, we will block there > forever... > In general, if there is an error while complete file, we could go with the > recoverLease logic so it is OK to throw exception when completeFile. We > should change the implementation so it does not block there forever. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28963) Updating Quota Factors is too expensive
[ https://issues.apache.org/jira/browse/HBASE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28963: --- Labels: pull-request-available (was: ) > Updating Quota Factors is too expensive > --- > > Key: HBASE-28963 > URL: https://issues.apache.org/jira/browse/HBASE-28963 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Ray Mattingly >Assignee: Ray Mattingly >Priority: Major > Labels: pull-request-available > Attachments: image-2024-11-06-12-06-44-317.png, > quota-refresh-hmaster.png > > > My company is running Quotas across a few hundred clusters of varied size. > One cluster has hundreds of servers and tens of thousands of regions. We > noticed that the HMaster was quite busy for this cluster, and after some > investigation we realized that RegionServers were hammering the HMaster's > ClusterMetrics endpoint to facilitate the refreshing of table machine quota > factors. > There are a few things that we could do here — in a perfect world, I think > the RegionServers would have a better P2P communication of region states, and > whatever else is, necessary to derive new quota factors. Relying solely on > the HMaster for this coordination creates a tricky bottleneck for the > horizontal scalability of clusters. > That said, I think that a simpler and preferable initial step would be to > make our code a bit more cost conscious. At my company, for example, we don't > even define any table-scoped quotas. Without any table scoped quotas in the > cache, our cache could be much more thoughtful about the work that it chooses > to do on each refresh. So I'm proposing that we check [the size of the > tableQuotaCache > keyset|https://github.com/apache/hbase/blob/db3ba44a4c692d26e70b6030fc519e92fd79f638/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L418] > earlier, and use this inference to determine what ClusterMetrics we bother > to fetch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28965) Make the approach in HBASE-28955 can work together with hadoop 2.x
[ https://issues.apache.org/jira/browse/HBASE-28965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28965: --- Labels: pull-request-available (was: ) > Make the approach in HBASE-28955 can work together with hadoop 2.x > -- > > Key: HBASE-28965 > URL: https://issues.apache.org/jira/browse/HBASE-28965 > Project: HBase > Issue Type: Improvement > Components: Filesystem Integration, hadoop2, io >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > The solution before HBASE-28955 can work with hadoop 2.x but after applying > the changes in HBASE-28955, we can work with hadoop 3.4.x but we can not > compile with hadoop 2.x then. > We should find a way to support hadoop 2.x so we can also port the changes to > branch-2.x as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28967) [hbase-thirdparty] add jersey-media-json-jackson in hbase-shaded-jersey
[ https://issues.apache.org/jira/browse/HBASE-28967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28967: --- Labels: pull-request-available (was: ) > [hbase-thirdparty] add jersey-media-json-jackson in hbase-shaded-jersey > > > Key: HBASE-28967 > URL: https://issues.apache.org/jira/browse/HBASE-28967 > Project: HBase > Issue Type: Bug >Affects Versions: thirdparty-4.1.9 >Reporter: dingbaosheng >Assignee: dingbaosheng >Priority: Major > Labels: pull-request-available > > In a REST service built using hbase-shaded-jersey.jar, I'm unable to convert > a Response object into a JSON string. > The reason I cannot convert a Response object to a JSON string in my REST > service built with hbase-shaded-jersey.jar is because it lacks the necessary > dependencies from jersey-media-json-jackson. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28962) Meta replication is inconsistent after startup when reusing hbase storage location
[ https://issues.apache.org/jira/browse/HBASE-28962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28962: --- Labels: pull-request-available (was: ) > Meta replication is inconsistent after startup when reusing hbase storage > location > -- > > Key: HBASE-28962 > URL: https://issues.apache.org/jira/browse/HBASE-28962 > Project: HBase > Issue Type: Bug >Reporter: Richárd Antal >Priority: Major > Labels: pull-request-available > > Meta replication is inconsistent after startup when reusing HBase storage > location for example the new cluster is created with the same S3 storage > location that a previous one had. > USE_META_REPLICAS is set to true > META_REPLICAS_NUM is set to 3 > For both > hbck output: > {noformat} > ERROR: hbase:meta, replicaId 1 is not found on any region. > ERROR: hbase:meta, replicaId 2 is not found on any region. > ERROR: hbase:meta table is not consistent. Run HBCK with proper fix options > to fix hbase:meta inconsistency. Exiting... > Summary: > 3 inconsistencies detected. > Status: INCONSISTENT{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28952) Add coprocessor hook to authorize user based on client SSL certificate chain
[ https://issues.apache.org/jira/browse/HBASE-28952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28952: --- Labels: pull-request-available (was: ) > Add coprocessor hook to authorize user based on client SSL certificate chain > > > Key: HBASE-28952 > URL: https://issues.apache.org/jira/browse/HBASE-28952 > Project: HBase > Issue Type: New Feature > Components: Coprocessors, rpc, security >Affects Versions: 3.0.0-beta-1, 2.6.1 >Reporter: Andor Molnar >Assignee: Andor Molnar >Priority: Major > Labels: pull-request-available > > In order to authorize the connected user based on the provided SSL client > certificate chain, a new coprocessor hook is needed for Master and > RegionServer, because currently there's no way to abort connection by a > coprocessor when or after the user is authenticated. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28920) Add Clearwater Analytics to powered by hbase
[ https://issues.apache.org/jira/browse/HBASE-28920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28920: --- Labels: pull-request-available (was: ) > Add Clearwater Analytics to powered by hbase > > > Key: HBASE-28920 > URL: https://issues.apache.org/jira/browse/HBASE-28920 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Sanjeev Dhiman >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > > According to [https://hbase.apache.org/poweredbyhbase.html] is this the right > ticket to add my organization name? If so, could you please add the content > below? LMK if you have any questions. Thank you! > == > Clearwater Analytics ([https://clearwateranalytics.com/]) has an HBase > cluster with 9 nodes in production. We have a handful of tables hosting > billions of rows (and growing) in a time series manner. Some tables are > pretty wide, with several hundreds of columns, while others are very tall and > narrow. > == > Thanks! > Sanjeev Dhiman > Principal Engineer > Clearwater Analytics -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28961) Add essential build plugins to hbase-diagnostics
[ https://issues.apache.org/jira/browse/HBASE-28961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28961: --- Labels: pull-request-available (was: ) > Add essential build plugins to hbase-diagnostics > > > Key: HBASE-28961 > URL: https://issues.apache.org/jira/browse/HBASE-28961 > Project: HBase > Issue Type: Sub-task >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > Seems I missed to add some essential / required plugin while adding the new > module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28955) Improve lease renew for FanOutOneBlockAsyncDFSOutput
[ https://issues.apache.org/jira/browse/HBASE-28955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28955: --- Labels: pull-request-available (was: ) > Improve lease renew for FanOutOneBlockAsyncDFSOutput > > > Key: HBASE-28955 > URL: https://issues.apache.org/jira/browse/HBASE-28955 > Project: HBase > Issue Type: Bug >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > > When working with hadoop 3.4.x, we saw this in the stdout file > {noformat} > Exception in thread "LeaseRenewer:zhangduo@home" > java.lang.NullPointerException: Cannot invoke > "org.apache.hadoop.hdfs.DFSOutputStream.getNamespace()" because > "outputStream" is null > at org.apache.hadoop.hdfs.DFSClient.getNamespaces(DFSClient.java:596) > at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:618) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.renew(LeaseRenewer.java:425) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:445) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$800(LeaseRenewer.java:77) > at > org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:336) > at java.base/java.lang.Thread.run(Thread.java:840) > {noformat} > This is because in newer DFSClient implementation, we need to pass namespace > when renewer lease so we can not just pass null as DFSOutputStream when > calling DFSClient.beginFileLease. We should find a way to deal with it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28956) RSMobFileCleanerChore may close the StoreFileReader object which is being used by Compaction thread
[ https://issues.apache.org/jira/browse/HBASE-28956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28956: --- Labels: pull-request-available (was: ) > RSMobFileCleanerChore may close the StoreFileReader object which is being > used by Compaction thread > --- > > Key: HBASE-28956 > URL: https://issues.apache.org/jira/browse/HBASE-28956 > Project: HBase > Issue Type: Bug > Components: Compaction, mob >Reporter: guluo >Assignee: guluo >Priority: Major > Labels: pull-request-available > > For MOB table, RSMobFileCleanerChore is responsible for cleaning MOB files > that are no longer referebced by the region located in the current > RegionServer. > RSMobFileCleanerChore get the the information of MOB files by reading the > storefile, as fallow. > ```java > // RSMobFileCleanerChore.chore() > sf.initReader(); > byte[] mobRefData = sf.getMetadataValue(HStoreFile.MOB_FILE_REFS); > byte[] bulkloadMarkerData = sf.getMetadataValue(HStoreFile.BULKLOAD_TASK_KEY); > // close store file to avoid memory leaks > sf.closeStoreFile(true); > ``` > There is an issue in here, if the StoreFileReader was not created by > RSMobFileCleanerChore, but RSMobFileCleanerChore closed it, which will cause > the thread that created the object to be unusable, resuting ERROR finally. > > Reproduction: > This is an occasional problem, but the probability of its occurrence can be > increased by making the following modifications. > 1. Setting hbase.master.mob.cleaner.period from 24h to 10s, and restart hbase. > 2. Puting some mob data into a MOB table. > 3. At the same time, executing compaction command for the MOB table, and it > is possible that this problem may occur. > > The error logs as follow. > ERROR: java.io.IOException: Cannot invoke > "org.apache.hadoop.hbase.regionserver.StoreFileReader.getMaxTimestamp()" > because the return value of > "org.apache.hadoop.hbase.regionserver.HStoreFile.getReader()" is null > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:512) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) > at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) > Caused by: java.lang.NullPointerException: Cannot invoke > "org.apache.hadoop.hbase.regionserver.StoreFileReader.getMaxTimestamp()" > because the return value of > "org.apache.hadoop.hbase.regionserver.HStoreFile.getReader()" is null > at > org.apache.hadoop.hbase.regionserver.DefaultStoreFileManager.lambda$getUnneededFiles$3(DefaultStoreFileManager.java:235) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28953) Prefetch thread shouldn't run for master store
[ https://issues.apache.org/jira/browse/HBASE-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28953: --- Labels: pull-request-available (was: ) > Prefetch thread shouldn't run for master store > -- > > Key: HBASE-28953 > URL: https://issues.apache.org/jira/browse/HBASE-28953 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Labels: pull-request-available > > The master store is hosted on Master processes. Since masters don't have a > BlockCache, we shouldn't run a prefetch thread at all, when opening the > master store region. > Currently, this is causing a NoSuchElementException to be logged in master > logs, which although harmless, can be confusing for operators: > {noformat} > 2024-10-23 15:23:29,236 WARN > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl: Prefetch > path=s3a://odx-qe-bucket/cc-odx-coy6zg/cod-udzhubgp5l5k/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/336a510d9f9a472c9e2a8f3e00352e3b, > offset=0, end=82012 > java.util.NoSuchElementException: No value present > at java.base/java.util.Optional.get(Optional.java:143) > at > org.apache.hadoop.hbase.io.hfile.HFilePreadReader$1.run(HFilePreadReader.java:73) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at java.base/java.util.concurrent.ThreadPoolExecute {noformat} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28954) Apply workaround for HADOOP-19164 for Hadoop 3.4.1
[ https://issues.apache.org/jira/browse/HBASE-28954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28954: --- Labels: pull-request-available (was: ) > Apply workaround for HADOOP-19164 for Hadoop 3.4.1 > -- > > Key: HBASE-28954 > URL: https://issues.apache.org/jira/browse/HBASE-28954 > Project: HBase > Issue Type: Bug > Components: jenkins, test >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > HADOOP-19164 is also present in 3.4.1, not just 3.4.0. > We need to update the scripts to apply the workaround for both 3.4.0 and 3.4.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28268) Add option to skip wal while using TableOutputFormat
[ https://issues.apache.org/jira/browse/HBASE-28268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28268: --- Labels: newbie pull-request-available (was: newbie) > Add option to skip wal while using TableOutputFormat > - > > Key: HBASE-28268 > URL: https://issues.apache.org/jira/browse/HBASE-28268 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Ujjawal Kumar >Assignee: Raghavendra B >Priority: Minor > Labels: newbie, pull-request-available > > If we have replication setup between clusters A <-> B, during production > incidents where HFiles get corrupted (e.g. via hdfs missing blocks) on > cluster A, we can use CopyTable job to copy the data from B > In those cases considering the data is being copied from B to A via > CopyTable, it doesn't make sense to write to WALs on A using > TableOutputFormat (since the mutations being copied already exist on B). > We can add a config to customise durability > ([https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Durability.java]) > within TableOutputFormat so that an operator can use it to skip WAL writes -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28328) Add an option to count different types of Delete Markers in RowCounter
[ https://issues.apache.org/jira/browse/HBASE-28328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28328: --- Labels: pull-request-available (was: ) > Add an option to count different types of Delete Markers in RowCounter > -- > > Key: HBASE-28328 > URL: https://issues.apache.org/jira/browse/HBASE-28328 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Himanshu Gwalani >Assignee: Shubham Roy >Priority: Minor > Labels: pull-request-available > > Add an option (count-delete-markers) to the > [RowCounter|https://github.com/apache/hbase/blob/8a9ad0736621fa1b00b5ae90529ca6065f88c67f/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java#L240C62-L240C75] > tool to count the number of Delete Markers of all types, i.e. (DELETE, > DELETE_COLUMN, DELETE_FAMILY,DELETE_FAMILY_VERSION) > We already have such a feature within our internal implementation of > RowCounter and it's very useful. > Implementation Ideas: > 1. If the option is passed, initialize the empty job counters for all 4 types > of deletes. > 2. Within mapper, increase the respective delete counts while processing each > row. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28861) Use hasReferences api of SFT in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-28861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28861: --- Labels: pull-request-available (was: ) > Use hasReferences api of SFT in RegionSplitter > -- > > Key: HBASE-28861 > URL: https://issues.apache.org/jira/browse/HBASE-28861 > Project: HBase > Issue Type: Improvement >Reporter: Prathyusha >Assignee: Prathyusha >Priority: Minor > Labels: pull-request-available > > HBASE-28564 Adds apis in SFT layer for all interaction related to Reference > files - create, read, hasReferences > This Jira refactors RegionSplitter code to use the hasReferences api of SFT > instead of HRegionFileSystem.hasReferences() -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-24490) Revisit onlineRegions and onlineRegionsLock in HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-24490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-24490: --- Labels: pull-request-available (was: ) > Revisit onlineRegions and onlineRegionsLock in HRegionServer > > > Key: HBASE-24490 > URL: https://issues.apache.org/jira/browse/HBASE-24490 > Project: HBase > Issue Type: Task > Components: regionserver >Reporter: Duo Zhang >Assignee: Semen Komissarov >Priority: Major > Labels: pull-request-available > > The onlineRegionsLock is only used in closeMetaTableRegions and > closeUserRegions, and only writeLock is used so I wonder whether we can > change it to a Lock instead of ReadWriteLock. > And also, in the several getRegions method, we have synchronized > (this.onlineRegions), which is not necessary, since the onlineRegions is > already a ConcurrentMap. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28948) RegionMover tool fails when table is deleted
[ https://issues.apache.org/jira/browse/HBASE-28948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28948: --- Labels: pull-request-available (was: ) > RegionMover tool fails when table is deleted > - > > Key: HBASE-28948 > URL: https://issues.apache.org/jira/browse/HBASE-28948 > Project: HBase > Issue Type: Bug >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Major > Labels: pull-request-available > > The issue occurs when the RegionMover tool encounters a > TableNotFoundException while moving regions. This happens if the table being > processed is deleted while the RegionMover is running. > {noformat} > 2024-10-30 11:58:20,500 INFO org.apache.hadoop.hbase.util.RegionMover: Moving > 10 regions to server0.example.org,16020,1730289478970 using 30 threads.Ack > mode:false > 2024-10-30 11:58:20,564 ERROR org.apache.hadoop.hbase.util.RegionMover: Error > while loading regions to server0.example.org > org.apache.hadoop.hbase.TableNotFoundException: cluster_test > at > org.apache.hadoop.hbase.client.HBaseAdmin$46.rpcCall(HBaseAdmin.java:2120) > at > org.apache.hadoop.hbase.client.HBaseAdmin$46.rpcCall(HBaseAdmin.java:2116) > at > org.apache.hadoop.hbase.client.RpcRetryingCallable.call(RpcRetryingCallable.java:57) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:104) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3088) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3080) > at > org.apache.hadoop.hbase.client.HBaseAdmin.checkTableExists(HBaseAdmin.java:2116) > at > org.apache.hadoop.hbase.client.HBaseAdmin.isTableEnabled(HBaseAdmin.java:978) > at > org.apache.hadoop.hbase.util.MoveWithAck.getServerNameForRegion(MoveWithAck.java:140) > at > org.apache.hadoop.hbase.util.RegionMover.loadRegions(RegionMover.java:358) > at > org.apache.hadoop.hbase.util.RegionMover.lambda$getRegionsMovePlan$2(RegionMover.java:333) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:840) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28600) Enable setting blockcache on-heap sizes in bytes
[ https://issues.apache.org/jira/browse/HBASE-28600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28600: --- Labels: pull-request-available (was: ) > Enable setting blockcache on-heap sizes in bytes > > > Key: HBASE-28600 > URL: https://issues.apache.org/jira/browse/HBASE-28600 > Project: HBase > Issue Type: Improvement > Components: regionserver >Reporter: Nick Dimiduk >Assignee: JinHyuk Kim >Priority: Major > Labels: pull-request-available > > Specifying blockcache and memcache sizes as a percentage of heap is not > always ideal. Sometimes it's easier to specify exact values rather than > backing into a percentage. Let's introduce new configuration settings > (perhaps named similarly to {{hbase.bucketcache.size}}) that accept byte > values. Even nicer would be if these settings accepted human-friendly byte > values like {{512m}} or {{10g}}. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28946) Update nightlies to run with HADOOP2_VERSION = 2.10.2
[ https://issues.apache.org/jira/browse/HBASE-28946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28946: --- Labels: pull-request-available (was: ) > Update nightlies to run with HADOOP2_VERSION = 2.10.2 > - > > Key: HBASE-28946 > URL: https://issues.apache.org/jira/browse/HBASE-28946 > Project: HBase > Issue Type: Bug > Components: hadoop2 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > We are running nighlies with older hadoop2 and hence expected failures are > not seen: > * I see hadoop distributed 2.10.0 installed for last run at > [https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/lastCompletedBuild/execution/node/121/ws/hadoop-2/share/hadoop/hdfs/] > ** Ref: > [https://github.com/apache/hbase/blob/master/dev-support/Jenkinsfile#L133] > IMO they are wrong should be 2.10.2 for all branches. > CC: [~stoty] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28945) Update nightlies to run with HADOOP3_VERSION = 3.2.4
[ https://issues.apache.org/jira/browse/HBASE-28945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28945: --- Labels: pull-request-available (was: ) > Update nightlies to run with HADOOP3_VERSION = 3.2.4 > > > Key: HBASE-28945 > URL: https://issues.apache.org/jira/browse/HBASE-28945 > Project: HBase > Issue Type: Bug > Components: hadoop3 >Affects Versions: 2.5.10 >Reporter: Nihal Jain >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > We are running nighlies with older hadoop3 on branch-2.5 and hence expected > failures are not seen: > * Also I see hadoop 3.1.1 for hadoop3 at > [https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2.5/lastCompletedBuild/execution/node/121/ws/hadoop-3/share/hadoop/client/] > > ** Ref: > [https://github.com/apache/hbase/blob/branch-2.5/dev-support/Jenkinsfile#L157] > IMO is is wrong should 3.2.4 for branch-2.5 > CC: [~stoty] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28947) Backport "HBASE-27598 Upgrade mockito to 4.x" to branch-2.5
[ https://issues.apache.org/jira/browse/HBASE-28947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28947: --- Labels: pull-request-available (was: ) > Backport "HBASE-27598 Upgrade mockito to 4.x" to branch-2.5 > --- > > Key: HBASE-28947 > URL: https://issues.apache.org/jira/browse/HBASE-28947 > Project: HBase > Issue Type: Improvement > Components: dependencies, test >Affects Versions: 2.5.10 >Reporter: Duo Zhang >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-alpha-4 > > > Mockito 2.28.2 is release May 29, 2019, which is very old now. Let's upgrade > to the latest 4.x version, as mockito 5.0.0 requires java 11, which is not > suitable for us. > Need to backport this to branch-2.5 as we face > https://issues.apache.org/jira/browse/HBASE-28944 w/o this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28944) TestShadedHBaseTestingUtility fails with NCDFE: org/mockito/stubbing/Answer
[ https://issues.apache.org/jira/browse/HBASE-28944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28944: --- Labels: pull-request-available (was: ) > TestShadedHBaseTestingUtility fails with NCDFE: org/mockito/stubbing/Answer > --- > > Key: HBASE-28944 > URL: https://issues.apache.org/jira/browse/HBASE-28944 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.5.10 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > While working on https://github.com/apache/hbase/pull/6414 found we have > following pre-existing test failure: > {code:java} > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hbase.shaded.TestShadedHBaseTestingUtility > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006 > s <<< FAILURE! - in org.apache.hbase.shaded.TestShadedHBaseTestingUtility > [ERROR] org.apache.hbase.shaded.TestShadedHBaseTestingUtility Time elapsed: > 0.002 s <<< ERROR! > java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer > at > org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2590) > at > org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2604) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1479) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:958) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:849) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:689) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:669) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1141) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1116) > at > org.apache.hbase.shaded.TestShadedHBaseTestingUtility.setUp(TestShadedHBaseTestingUtility.java:46) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.ClassNotFoundException: org.mockito.stubbing.Answer > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) > ... 24 more > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestShadedHBaseTestingUtility.setUp:46 » NoClassDefFound > org/mockito/stubbing/Answer > [INFO] > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > [INFO] > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 16.878 s (Wall Clock) > [INFO] Finished at: 2024-10-30T07:43:39Z > [INFO] > > {code} > Seems related to https://issues.apache.org/jira/browse/HDFS-15915 > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28726) Revert REST protobuf package to org.apache.hadoop.hbase.rest
[ https://issues.apache.org/jira/browse/HBASE-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28726: --- Labels: pull-request-available (was: ) > Revert REST protobuf package to org.apache.hadoop.hbase.rest > > > Key: HBASE-28726 > URL: https://issues.apache.org/jira/browse/HBASE-28726 > Project: HBase > Issue Type: Bug > Components: REST >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > In Hbase 3+, the package name of the REST messages has been renamed to > org.apache.hadoop.hbase.shaded.rest from org.apache.hadoop.hbase.rest > These definitions are only used by REST, and have nothing to do with standard > HBase RPC communication. > I propose reverting the package name. > We may also want to move the protobuf definitions back to the hbase-rest > module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28942) Purge all netty 3 dependencies by default
[ https://issues.apache.org/jira/browse/HBASE-28942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28942: --- Labels: pull-request-available (was: ) > Purge all netty 3 dependencies by default > - > > Key: HBASE-28942 > URL: https://issues.apache.org/jira/browse/HBASE-28942 > Project: HBase > Issue Type: Task > Components: dependencies, security >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > After bumping default hadoop version to 3.4.1, we should be able to remove > all netty 3 dependencies. > Now the problem is in hbase-external-blockcache, where we use jmemcached for > testing, it is an old library which still depends on netty 3. > We should find a way to deal with this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28943) Remove all jackson 1.x dependencies for hadoop-3 profile, since all jackson 1.x versions have vulnerabilities
[ https://issues.apache.org/jira/browse/HBASE-28943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28943: --- Labels: pull-request-available (was: ) > Remove all jackson 1.x dependencies for hadoop-3 profile, since all jackson > 1.x versions have vulnerabilities > - > > Key: HBASE-28943 > URL: https://issues.apache.org/jira/browse/HBASE-28943 > Project: HBase > Issue Type: Task > Components: hadoop3, security >Affects Versions: 2.6.1, 2.5.10 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > Building hbase with hadoop-3 profile on branch-2, still requires jackson 1.x > jars, which has vulnerabilities. Ideally these should not be needed as with > HADOOP-13332 hadoop has already "Remove jackson 1.9.13 and switch all jackson > code to 2.x code line" for branch-3. > Also in HBASE-27148, where we worked on "Move minimum hadoop 3 support > version to 3.2.3" we had did a similar cleanup for branch-3 but somehow we > missed to port the relevant changes to the branch-2 backport of same jira. > This task is to take care of this so that we donot need jackson 1.x to > build/run hbase with hadoop-3 profile on branch-2.x. > > We have following in our dependency tree: > {code:java} > [INFO] --< org.apache.hbase:hbase-shaded-client-byo-hadoop > >--- > [INFO] Building Apache HBase - Shaded - Client 2.7.0-SNAPSHOT > [33/53] > [INFO] from hbase-shaded/hbase-shaded-client-byo-hadoop/pom.xml > [INFO] [ jar > ]- > [INFO] > [INFO] +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:provided > [INFO] +- org.codehaus.jackson:jackson-xc:jar:1.9.13:provided > . > . > [INFO] --< org.apache.hbase:hbase-shaded-mapreduce > >--- > [INFO] Building Apache HBase - Shaded - MapReduce 2.7.0-SNAPSHOT > [34/53] > [INFO] from hbase-shaded/hbase-shaded-mapreduce/pom.xml > [INFO] [ jar > ]- > [INFO] > [INFO] +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:provided > [INFO] +- org.codehaus.jackson:jackson-xc:jar:1.9.13:provided > . > . > [INFO] -< org.apache.hbase:hbase-shaded-testing-util > >- > [INFO] Building Apache HBase - Shaded - Testing Util 2.7.0-SNAPSHOT > [46/53] > [INFO] from hbase-shaded/hbase-shaded-testing-util/pom.xml > [INFO] [ jar > ]- > [INFO] > [INFO] +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile > [INFO] | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile > [INFO] | \- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile > [INFO] | +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:test > . > . > [INFO] -< org.apache.hbase:hbase-shaded-testing-util-tester > >-- > [INFO] Building Apache HBase - Shaded - Testing Util Tester 2.7.0-SNAPSHOT > [47/53] > [INFO] from hbase-shaded/hbase-shaded-testing-util-tester/pom.xml > [INFO] [ jar > ]- > [INFO] > [INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:test > [INFO] | \- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:test {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28940) Do not run the backwards compatibility tests with the default Hadoop3 version
[ https://issues.apache.org/jira/browse/HBASE-28940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28940: --- Labels: pull-request-available (was: ) > Do not run the backwards compatibility tests with the default Hadoop3 version > - > > Key: HBASE-28940 > URL: https://issues.apache.org/jira/browse/HBASE-28940 > Project: HBase > Issue Type: Improvement > Components: integration tests, test >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > We are running the full test suite with it already, running the dev tests as > well is completely redundant, and a waste of time and resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28941) Clear all meta caches of the server on which hardware failure related exceptions occurred
[ https://issues.apache.org/jira/browse/HBASE-28941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28941: --- Labels: pull-request-available (was: ) > Clear all meta caches of the server on which hardware failure related > exceptions occurred > - > > Key: HBASE-28941 > URL: https://issues.apache.org/jira/browse/HBASE-28941 > Project: HBase > Issue Type: Improvement >Reporter: Eungsop Yoo >Assignee: Eungsop Yoo >Priority: Major > Labels: pull-request-available > > CallTimeoutException and ConnectException might be caused by a network or > hardware issue of that server. We might not be able to connect to that server > for a while. So we have to clear all meta caches of the server on which > hardware failure related exceptions occurred. If we don't clear the caches, > we might get the same exceptions as many times as the number of location > caches of that server. > > https://issues.apache.org/jira/browse/HBASE-7590 > We already have ClusterStatusPublisher/Listener feature. But it is not > possible to use this feature in some infrastructure environments like me. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28939) Change the default Hadoop 3 version to 3.4.1
[ https://issues.apache.org/jira/browse/HBASE-28939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28939: --- Labels: pull-request-available (was: ) > Change the default Hadoop 3 version to 3.4.1 > > > Key: HBASE-28939 > URL: https://issues.apache.org/jira/browse/HBASE-28939 > Project: HBase > Issue Type: Sub-task > Components: hadoop3 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28937) Should check if compaction are needed after flushing
[ https://issues.apache.org/jira/browse/HBASE-28937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28937: --- Labels: pull-request-available (was: ) > Should check if compaction are needed after flushing > > > Key: HBASE-28937 > URL: https://issues.apache.org/jira/browse/HBASE-28937 > Project: HBase > Issue Type: Improvement > Components: Compaction >Affects Versions: 2.4.13 > Environment: hbase2.4.13 > Centos7 >Reporter: guluo >Priority: Major > Labels: pull-request-available > > Regardless of whether we have enabled flsuh procedure or not, even if the > threshold is reached, hbase would not execute comapct after flushing > operation. > I think we should check the HRegion.FlushResult, and send compact request if > it is needed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28935) [HBCK2] filesystem command always report region hole and doesn't exit automatically
[ https://issues.apache.org/jira/browse/HBASE-28935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28935: --- Labels: pull-request-available (was: ) > [HBCK2] filesystem command always report region hole and doesn't exit > automatically > --- > > Key: HBASE-28935 > URL: https://issues.apache.org/jira/browse/HBASE-28935 > Project: HBase > Issue Type: Bug > Components: hbase-operator-tools >Reporter: haosen chen >Assignee: haosen chen >Priority: Minor > Labels: pull-request-available > > 1. When executing hbck2 filesystem command, it always reports region hole for > all region like: > {code:java} > ERROR: There is a hole in the region chain between and . You need to create > a new .regioninfo and region dir in hdfs to plug the hole. > ERROR: Found inconsistency in table hbase:meta > ERROR: There is a hole in the region chain between and . You need to create > a new .regioninfo and region dir in hdfs to plug the hole. > ERROR: Found inconsistency in table hbase:namespace{code} > The reason for this problem is that the region directories are not loaded > at all. In hbck1, the process will load all region directories by default. > 2. After the check is completed, the process will not exit automatically > because the hbasefsck threads are not daemon threads. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28934) The HFile Reader creation should not be blocked due to waits for cache initialisation.
[ https://issues.apache.org/jira/browse/HBASE-28934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28934: --- Labels: pull-request-available (was: ) > The HFile Reader creation should not be blocked due to waits for cache > initialisation. > -- > > Key: HBASE-28934 > URL: https://issues.apache.org/jira/browse/HBASE-28934 > Project: HBase > Issue Type: Bug > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > When persistent bucket cache is enabled, bucket cache needs to read and load > the persistent cache file contents during region server starting time. This > cache initialisation is currently asynchronous. The server restart completes > after spawning a thread to initialise the cache. > When the subsequent HFile readers are created, the current implementation is > such that the constructor of the hfile (HFilePreadReader) waits until the > cache is initialised. Due to this wait, all client requests to these regions > would fail during the whole cache initialisation period, as these regions are > not yet online. > The correct way to handle this is that, the constructors of HFile readers > should not wait for cache initialisation. Only the prefetch threads should > wait for the cache initialisation. > Subsequently, any client requests should be served by accessing the > underlying file system if the cache is not yet initialised, and a cache miss > should be computed in the cache stats. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28177) mobStore directory do not set storage policy like normal store directory
[ https://issues.apache.org/jira/browse/HBASE-28177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28177: --- Labels: pull-request-available (was: ) > mobStore directory do not set storage policy like normal store directory > > > Key: HBASE-28177 > URL: https://issues.apache.org/jira/browse/HBASE-28177 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.5.6 >Reporter: xieyupei >Assignee: xieyupei >Priority: Minor > Labels: pull-request-available > > We set block storage policy for store directory only in the HStore, but > mobStore path is generate in the HMobStore. I write a test case to describe > this bug in TestHRegionFileSystem. > {code:java} > @Test > public void testMobStoreStoragePolicy() throws Exception { > TEST_UTIL = new HBaseTestingUtil(); > Configuration conf = TEST_UTIL.getConfiguration(); > TEST_UTIL.startMiniCluster(); > Table table = TEST_UTIL.createTable(TABLE_NAME, FAMILIES); > assertEquals("Should start with empty table", 0, > TEST_UTIL.countRows(table)); > HRegionFileSystem regionFs = getHRegionFS(TEST_UTIL.getConnection(), table, > conf); > try (Admin admin = TEST_UTIL.getConnection().getAdmin()) { > ColumnFamilyDescriptorBuilder cfdA = > ColumnFamilyDescriptorBuilder.newBuilder(FAMILIES[0]); > cfdA.setValue(HStore.BLOCK_STORAGE_POLICY_KEY, "ONE_SSD"); > cfdA.setMobEnabled(true); > cfdA.setMobThreshold(2L); > admin.modifyColumnFamily(TABLE_NAME, cfdA.build()); > while ( > > TEST_UTIL.getMiniHBaseCluster().getMaster().getAssignmentManager().getRegionStates() > .hasRegionsInTransition() > ) { > Thread.sleep(200); > LOG.debug("Waiting on table to finish schema altering"); > } > // flush memstore snapshot into 3 files > for (long i = 0; i < 3; i++) { > Put put = new Put(Bytes.toBytes(i)); > put.addColumn(FAMILIES[0], Bytes.toBytes(i), Bytes.toBytes(i)); > put.addColumn(FAMILIES[0], Bytes.toBytes(i + "qf"), Bytes.toBytes(i + > "value")); > table.put(put); > admin.flush(TABLE_NAME); > } > // there should be 3 files in store dir > FileSystem fs = TEST_UTIL.getDFSCluster().getFileSystem(); > Path storePath = regionFs.getStoreDir(Bytes.toString(FAMILIES[0])); > Path mobStorePath = MobUtils.getMobFamilyPath(conf, TABLE_NAME, > Bytes.toString(FAMILIES[0])); > FileStatus[] storeFiles = CommonFSUtils.listStatus(fs, storePath); > FileStatus[] mobStoreFiles = CommonFSUtils.listStatus(fs, mobStorePath); > assertNotNull(storeFiles); > assertEquals(3, storeFiles.length); > assertNotNull(mobStoreFiles); > assertEquals(3, mobStoreFiles.length); > for (FileStatus status : storeFiles) { > assertEquals("ONE_SSD", > ((HFileSystem) > regionFs.getFileSystem()).getStoragePolicyName(status.getPath())); > } > for (FileStatus status : mobStoreFiles) { > assertEquals("ONE_SSD", > ((HFileSystem) > regionFs.getFileSystem()).getStoragePolicyName(status.getPath())); > } > } finally { > table.close(); > TEST_UTIL.deleteTable(TABLE_NAME); > TEST_UTIL.shutdownMiniCluster(); > } > }{code} > Also we can get storage policy in shell like this: > {code:java} > root@hbase-master:/usr/local/hadoop# ./bin/hdfs storagepolicies > -getStoragePolicy -path > /hbase/data/default/member/a645c7c2b31371449331a4e4106b073b/info > The storage policy of > /hbase/data/default/member/a645c7c2b31371449331a4e4106b073b/info: > BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], > replicationFallbacks=[ARCHIVE]} > root@hbase-master:/usr/local/hadoop# ./bin/hdfs storagepolicies > -getStoragePolicy -path > /hbase/mobdir/data/default/member/288b5f8af920a8190cc07bad277debb5/info > The storage policy of > /hbase/mobdir/data/default/member/288b5f8af920a8190cc07bad277debb5/info is > unspecified {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28927) Fix spotbugs issues introduced by refactoring to hbase-diagnostics with HBASE-28432
[ https://issues.apache.org/jira/browse/HBASE-28927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28927: --- Labels: pull-request-available (was: ) > Fix spotbugs issues introduced by refactoring to hbase-diagnostics with > HBASE-28432 > --- > > Key: HBASE-28927 > URL: https://issues.apache.org/jira/browse/HBASE-28927 > Project: HBase > Issue Type: Sub-task >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > Seeing a lot of spotbugs issues across PRs lately. These are coming due to > newly added main code refactored as part of > https://issues.apache.org/jira/browse/HBASE-28432 I was not aware spotbugs > issues will be on complete code and not just PR code change. Also wanted to > avoid changing code as we were just 'refactoring'. > As now code sits inside main and not test, we should try to fix all the > spotbugs issues. > CC: [~ndimiduk], [~stoty] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28919) Soft drop for destructive table actions
[ https://issues.apache.org/jira/browse/HBASE-28919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28919: --- Labels: pull-request-available (was: ) > Soft drop for destructive table actions > --- > > Key: HBASE-28919 > URL: https://issues.apache.org/jira/browse/HBASE-28919 > Project: HBase > Issue Type: New Feature > Components: master, snapshots >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Labels: pull-request-available > Attachments: Soft Drop for Destructive Table Actions.pdf > > > When we administratively drop a table column or entire table, or truncate a > table, the process begins rapidly. Procedures are scheduled for immediate > execution that then modify or remove descriptors and state in META and on > disk, and take unrecoverable actions at the HDFS layer. Although HFiles are > copied to the archive in a destructive action, recovery scenarios are not > automatic and involve some operator labor to reconstruct the table and > re-import the archived data. If the HFileCleaner is not properly configured > to facilitate such recovery then some data is not recoverable soon after > procedure execution commences and all affected data is not recoverable within > minutes. A customer faced with such an accident will be unhappy because the > recovery scenarios available to them from this will involve either a restore > from backup or from an earlier snapshot, and any changes committed more > recently than the time of the last backup or last snapshot will be lost. > An effective solution is very simple: We can easily prevent the deletion of > the HFiles of a deleted table or table column family by taking a snapshot of > the table immediately prior to taking any destructive actions. We set a TTL > on the snapshot so housekeeping of truly unwanted HFiles remains no touch. > Because we take a table snapshot all table structure and metadata is also > captured and saved so fast recovery is possible, as either a restore from > snapshot, or a clone from snapshot to a new table. For as long as the > snapshot is retained it is straightforward to recover the table data by > either restoring the table from the snapshot or cloning the snapshot to a new > table, at the operator’s discretion. > No manual actions are required to see the table or column family (or > families) truly dropped. Once the snapshot TTL expires all the HFiles related > to the dropped table become eligible for deletion. When the HFileCleaner > chore executes after that time the HDFS level file deletes will commence with > associated reduction in storage requirements. > Design document is attached. > I have a *working implementation* of this proposal based on a fork of > branch-2.5. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28931) RPC TLS certificate is not reloaded when in Kubernetes Secret directory
[ https://issues.apache.org/jira/browse/HBASE-28931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28931: --- Labels: pull-request-available (was: ) > RPC TLS certificate is not reloaded when in Kubernetes Secret directory > --- > > Key: HBASE-28931 > URL: https://issues.apache.org/jira/browse/HBASE-28931 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Charles Connell >Assignee: Charles Connell >Priority: Major > Labels: pull-request-available > > At my company we have an issue with our HBase servers not reloading TLS > certificate files after they change on disk. We run our HMasters inside > Kubernetes Pods, and define our certificate contents as Kubernetes Secrets. > Then, the Secrets are projected into the HMaster containers as files. When > the value of a Secret changes, the file changes automatically. However, > Kubernetes does some complicated indirection, and does not change the files > directly. It swaps a new directory in with new files in it. > HBase sets up a WatchService on the directory containing the TLS cert. For > example, at my company, the cert is at > {{{}/etc/hadoop/conf/ssl/cert/server-chain.pem{}}}. Then, events from that > WatchService are delivered to a [handler > method|https://github.com/apache/hbase/blob/836630422df2776287a860eff9d7104c3eca0582/hbase-common/src/main/java/org/apache/hadoop/hbase/io/crypto/tls/X509Util.java#L530] > which contains this check: > {code:java} > Path eventFilePath = dirPath.resolve((Path) event.context()); > if (filePath.equals(eventFilePath)) { > shouldResetContext = true; > }{code} > Debug logs show why this conditional is never true: > {code:java} > 2024-10-21T17:48:13,659 [FileChangeWatcher-server-chain.pem] DEBUG > org.apache.hadoop.hbase.io.FileChangeWatcher: Got file changed event: > ENTRY_CREATE with context: ..2024_10_21_17_48_13.2471317370 > 2024-10-21T17:48:13,659 [FileChangeWatcher-server-chain.pem] DEBUG > org.apache.hadoop.hbase.io.FileChangeWatcher: Got file changed event: > ENTRY_CREATE with context: ..2024_10_21_17_48_13.2471317370 > 2024-10-21T17:48:13,660 [FileChangeWatcher-server-chain.pem] DEBUG > org.apache.hadoop.hbase.io.crypto.tls.X509Util: Ignoring watch event and > keeping previous default SSL context. Event kind: ENTRY_CREATE with context: > ..2024_10_21_17_48_13.2471317370 > {code} > The watcher events have a variety of files attached to them, but none of them > are {{{}server-chain.pem{}}}, so HBase thinks they are not relevant. > I propose that we simply remove the condition inspecting the file name that > was changed, and always reload the SSL context if a watcher event fires. This > may lead to unnecessary reloads, but that will be harmless. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28929) Set hadoop-three.version in Hadoop 3 backwards compatibility tests
[ https://issues.apache.org/jira/browse/HBASE-28929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28929: --- Labels: pull-request-available (was: ) > Set hadoop-three.version in Hadoop 3 backwards compatibility tests > -- > > Key: HBASE-28929 > URL: https://issues.apache.org/jira/browse/HBASE-28929 > Project: HBase > Issue Type: Bug > Components: backup&restore >Reporter: Istvan Toth >Priority: Major > Labels: pull-request-available > > Found by the new 3.4.0 nighty tests: > {noformat} > [ERROR] Errors: > [ERROR] > org.apache.hadoop.hbase.backup.TestBackupRestoreOnEmptyEnvironment.null > [ERROR] Run 1: > TestBackupRestoreOnEmptyEnvironment.testRestoreToCorrectTable:126->backup:223 > » TestTimedOut test timed out after 1560 seconds > [ERROR] Run 2: TestBackupRestoreOnEmptyEnvironment » Appears to be stuck > in thread MiniHBaseClusterRegionServer-EventLoopGroup-3-1 > [INFO] > [ERROR] > TestBackupRestoreOnEmptyEnvironment.testRestoreCorrectTableForIncremental:156->backup:223 > » NoClassDefFound org/apache/hadoop/util/Sets > [ERROR] > org.apache.hadoop.hbase.backup.TestIncrementalBackupMergeWithBulkLoad.null > [ERROR] Run 1: > TestIncrementalBackupMergeWithBulkLoad.testMergeContainingBulkloadedHfiles:127->backup:219 > » TestTimedOut test timed out after 1560 seconds > [ERROR] Run 2: TestIncrementalBackupMergeWithBulkLoad » Appears to be > stuck in thread MiniHBaseClusterRegionServer-EventLoopGroup-3-1 > [INFO] > [ERROR] > TestIncrementalBackupMergeWithBulkLoad.testMergeContainingBulkloadedHfiles:134->backup:219 > » NoClassDefFound org/apache/hadoop/util/Sets > [ERROR] > TestIncrementalBackupMergeWithBulkLoad.testMergeContainingBulkloadedHfiles:127->backup:219 > » InterruptedIO > [INFO] {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28928) Handle NPE in Split/Merge table when getMasterQuotaManager returns null
[ https://issues.apache.org/jira/browse/HBASE-28928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28928: --- Labels: pull-request-available (was: ) > Handle NPE in Split/Merge table when getMasterQuotaManager returns null > --- > > Key: HBASE-28928 > URL: https://issues.apache.org/jira/browse/HBASE-28928 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Affects Versions: 2.5.8 >Reporter: Aman Poonia >Assignee: Aman Poonia >Priority: Critical > Labels: pull-request-available > > Currently when doing splits or merge we notify quota manager about the same. > But if quota instance is null on below call > {code:java} > // code placeholder > env.getMasterServices().getMasterQuotaManager() {code} > So the below two lines will throw unexpected exception which ideally can be > handled. > {code:java} > // SplitTableRegionProcedure.java > env.getMasterServices().getMasterQuotaManager().onRegionSplit(this.getParentRegion()); > {code} > {code:java} > // MergeTableRegionProcedure.java > > env.getMasterServices().getMasterQuotaManager().onRegionMerged(this.mergedRegion); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28627) REST ScannerModel doesn't support includeStartRow/includeStopRow
[ https://issues.apache.org/jira/browse/HBASE-28627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28627: --- Labels: pull-request-available (was: ) > REST ScannerModel doesn't support includeStartRow/includeStopRow > > > Key: HBASE-28627 > URL: https://issues.apache.org/jira/browse/HBASE-28627 > Project: HBase > Issue Type: Bug > Components: REST >Reporter: Istvan Toth >Assignee: Chandra Sekhar K >Priority: Major > Labels: pull-request-available > > includeStartRow/includeStopRow should be transparently supported. > The current behaviour is limited and confusing, as the user would rightly > expect this to work via the REST interface. > The only problem is that adding them may break backwards compatibility. > Need to test if the XML unmarshaller can handle nonexistent fields. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28923) Prioritize "orphan" blocks for eviction inside BucketCache.freespace
[ https://issues.apache.org/jira/browse/HBASE-28923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28923: --- Labels: pull-request-available (was: ) > Prioritize "orphan" blocks for eviction inside BucketCache.freespace > > > Key: HBASE-28923 > URL: https://issues.apache.org/jira/browse/HBASE-28923 > Project: HBase > Issue Type: Improvement >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > The Persistent Cache feature brought the ability to recover the cache in the > event of a restart or crash. Under certain conditions, the cache recovery can > lead to _orphan_ blocks hanging on the cache, causing unnecessary extra cache > usage. > For example, when a region server crashes or restarts, the original regions > on this region server would be immediately reassigned on the remaining > servers. Once the crashed/restarted server comes back online, persistent > cache will recover the cache state prior to the crash/restart, which would > contain the blocks from the regions it was holding prior to the incident. > This can lead to the _orphan_ blocks situation described above in the > following conditions: > * If balancer is off, or cache aware balancer fails to move back the very > same regions from prior the crash; > * If compaction completes for the regions on the temporary servers; > Also, with the default evictsOnClose is set to false, any region move would > leave "orphans" blocks behind. > This proposes additional logic for identifying blocks not belonging to any > files from current online regions inside BucketCache.freeSpace method. > This would need to modify both BlockCacheFactory and BucketCache to pass > along the map of online regions kept by HRegionServer. > Inside BucketCache.freeSpace method, when iterating through the backingMap > and before separating the entries between the different eviction priority > groups, we can check if the given entry belongs to a block from any of the > online regions files, using BucketCache.evictBucketEntryIfNoRpcReferenced > method to remove it if its file is not found on any of the online regions. > An additional configurable grace period should be added, to consider only > blocks cached before this grace period as potentially orphans. This is to > avoid evicting blocks from currently being written files by > flushes/compactions, when “caching on write/caching on compaction” is enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28858) Add rel/2.6.1 to the downloads page
[ https://issues.apache.org/jira/browse/HBASE-28858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28858: --- Labels: pull-request-available (was: ) > Add rel/2.6.1 to the downloads page > --- > > Key: HBASE-28858 > URL: https://issues.apache.org/jira/browse/HBASE-28858 > Project: HBase > Issue Type: Sub-task >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28924) TestPrefetch#testPrefetchWithDelay fails intermittently
[ https://issues.apache.org/jira/browse/HBASE-28924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28924: --- Labels: pull-request-available (was: ) > TestPrefetch#testPrefetchWithDelay fails intermittently > --- > > Key: HBASE-28924 > URL: https://issues.apache.org/jira/browse/HBASE-28924 > Project: HBase > Issue Type: Improvement >Reporter: Prathyusha >Assignee: Prathyusha >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2024-10-17 at 10.39.09 PM.png > > > TestPrefetch#testPrefetchWithDelay fails intermittently with > {_}"{_}_AssertionError: Prefetch should start post configured delay"_ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27659) Incremental backups should re-use splits from last full backup
[ https://issues.apache.org/jira/browse/HBASE-27659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27659: --- Labels: pull-request-available (was: ) > Incremental backups should re-use splits from last full backup > -- > > Key: HBASE-27659 > URL: https://issues.apache.org/jira/browse/HBASE-27659 > Project: HBase > Issue Type: Improvement >Reporter: Bryan Beaudreault >Assignee: Hernan Gelaf-Romer >Priority: Major > Labels: pull-request-available > > All incremental backups require a previous full backup. Full backups use > snapshots + ExportSnapshot, which includes exporting the SnapshotManifest. > The SnapshotManifest includes all of the regions in the table during the > snapshot. > Incremental backups use WALPlayer to turn new HLogs since last backup into > HFiles. This uses HFileOutputFormat2, which writes HFiles along the split > boundaries of the tables at the time that it runs. > Active clusters may have regions split and merge over time, so the split > boundaries of incremental backup hfiles may not align to the original full > backup. This means we need to use MapReduceHFileSplitterJob during restore in > order to read all of the hfiles for all of the incremental backups and > re-split them based on the restored table. > * So let's say a cluster with regions A, B, C does a full backup. Data in > that backup will be segmented into those 3 regions. > * Over time the cluster splits and merges and we end up with totally > different regions D, E, F. An incremental backup occurs, and the data will be > segmented into those 3 regions.Later the cluster splits those 3 regions so we > end up with new regions G, H, I, J, K, L. Then next incremental backup goes > with that > When we go to restore this cluster, it'll pull the full backup and the 2 > incrementals. The full backup will get restored first, so the new table will > have regions A, B, C. Then all of the hfiles from the incrementals will be > combined together and run through MapReduceHFileSplitterJob. This will cause > all of those data files to get re-partitioned based on the A, B, C regions of > the newly restored table (based on the full backup). > This splitting process is expensive on a large cluster. We could skip it > entirely if incremental backups used the RegionInfos from the original full > backup SnapshotManifest as the splits for WALPlayer. Therefore, all > incremental backups will use the same splits as the original full backup. The > resulting hfiles could be directly bulkloaded without any split process, > reducing cost and time of restore. > One other benefit is that one could use the combination of a full backup + > all incremental backups as an input to their own mapreduce job. This > impossible now because all of the backups will have HFiles with different > start/end keys which don't align to a common set of splits for combining into > ClientSideRegionScanner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28917) ColumnFamilyMismatchException mixes IA public and private
[ https://issues.apache.org/jira/browse/HBASE-28917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28917: --- Labels: pull-request-available (was: ) > ColumnFamilyMismatchException mixes IA public and private > - > > Key: HBASE-28917 > URL: https://issues.apache.org/jira/browse/HBASE-28917 > Project: HBase > Issue Type: Improvement > Components: backup&restore >Reporter: Hernan Gelaf-Romer >Assignee: Hernan Gelaf-Romer >Priority: Major > Labels: pull-request-available > > The current implementation of ColumnFamilyMismatchException is IA public, but > extends BackupException, which is IA private. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28922) Bump commons-io:commons-io from 2.11.0 to 2.14.0
[ https://issues.apache.org/jira/browse/HBASE-28922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28922: --- Labels: pull-request-available (was: ) > Bump commons-io:commons-io from 2.11.0 to 2.14.0 > > > Key: HBASE-28922 > URL: https://issues.apache.org/jira/browse/HBASE-28922 > Project: HBase > Issue Type: Task > Components: dependabot, dependencies, security >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28921) Skip bundling hbase-webapps folder in jars
[ https://issues.apache.org/jira/browse/HBASE-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28921: --- Labels: pull-request-available (was: ) > Skip bundling hbase-webapps folder in jars > -- > > Key: HBASE-28921 > URL: https://issues.apache.org/jira/browse/HBASE-28921 > Project: HBase > Issue Type: Improvement > Components: security, UI >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > We are bundling all webapp resources in hbase-server, hbase-thrift and > transitively to hbase-shaded-mapreduce jar. This can be an issue as if any of > the js projects used by hbase are vulnerable, security scan tools like > sonatype start flagging the jars too as vulnerable since they contain > vulnerable code. > With this JIRA, we want to avoid bundling static webapp resources in our jars. > For example: Bootstrap 3.4.1 which is used by hbase, has multiple medium CVEs > reported recently. See [https://security.snyk.io/package/npm/bootstrap/3.4.1] > for details. > And since we are bundling all webapp resources in hbase-server, hbase-thrift > and transitively to hbase-shaded-mapreduce jar. And hence sonatype reports > all such jars also as vulnerable: > |3|CVE-2024-6484|2.3|bootstrap 3.4.1| > |3|CVE-2024-6484|2.3|org.apache.hbase : hbase-server : 2.6.0| > |3|CVE-2024-6484|2.3|org.apache.hbase : hbase-shaded-mapreduce : 2.6.0| > |3|CVE-2024-6484|2.3|org.apache.hbase : hbase-thrift : 2.6.0| > It is wise to remove such files from our jars to avoid any bigger hiccups in > future. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28904) Supports enabling storage policy in the data copying scenario of bulkload
[ https://issues.apache.org/jira/browse/HBASE-28904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28904: --- Labels: pull-request-available (was: ) > Supports enabling storage policy in the data copying scenario of bulkload > - > > Key: HBASE-28904 > URL: https://issues.apache.org/jira/browse/HBASE-28904 > Project: HBase > Issue Type: Improvement >Reporter: Liangjun He >Assignee: Liangjun He >Priority: Major > Labels: pull-request-available > > In the current HBase bulkload scenario, if a tiered storage policy is set for > the column family of a table and the operation involves different HDFS > clusters, the storage policy for the data imported via bulkload will not take > effect. We hope to enable the automatic activation of tiered storage policy > in the data copying scenario of bulkload. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28837) Add row statistics coprocessor
[ https://issues.apache.org/jira/browse/HBASE-28837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28837: --- Labels: pull-request-available (was: ) > Add row statistics coprocessor > -- > > Key: HBASE-28837 > URL: https://issues.apache.org/jira/browse/HBASE-28837 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0, 3.0.0-beta-1 >Reporter: Evelyn Boland >Assignee: Evelyn Boland >Priority: Major > Labels: pull-request-available > > Goal: > Add a coprocessor to HBase that allows administrators to track high level > statistics on the rows and cells in their HBase tables. Administrators can > load this coprocessor into their RegionServers if they wish to gain more > visibility into the shape of their data in HBase. > At my day job, we've leveraged the statistics from this coprocessor to > automatically configure more optimal block sizes and smarter compaction > schedules for our fleet of nearly 200 HBase clusters. > Context: > Since HBase tables can store terabytes or even petabytes of data, HBase > administrators often have incomplete information about the data stored in > their HBase tables. Without a comprehensive understanding of the shape of > their data, it can be difficult for administrators to configure clusters for > a desired level of performance and/or reliability. Row statistics have the > potential to supercharge HBase management. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28911) Automatic SSL keystore reloading for HttpServer
[ https://issues.apache.org/jira/browse/HBASE-28911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28911: --- Labels: pull-request-available (was: ) > Automatic SSL keystore reloading for HttpServer > --- > > Key: HBASE-28911 > URL: https://issues.apache.org/jira/browse/HBASE-28911 > Project: HBase > Issue Type: Improvement > Components: UI >Reporter: YUBI LEE >Priority: Major > Labels: pull-request-available > > Since https://issues.apache.org/jira/browse/HADOOP-16524, Hadoop can reload > SSL keystore without restarting. > It can be adopted to HBase similarly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28621) PrefixFilter should use SEEK_NEXT_USING_HINT
[ https://issues.apache.org/jira/browse/HBASE-28621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28621: --- Labels: beginner beginner-friendly pull-request-available (was: beginner beginner-friendly) > PrefixFilter should use SEEK_NEXT_USING_HINT > - > > Key: HBASE-28621 > URL: https://issues.apache.org/jira/browse/HBASE-28621 > Project: HBase > Issue Type: Improvement > Components: Filters >Reporter: Istvan Toth >Assignee: Dávid Paksy >Priority: Major > Labels: beginner, beginner-friendly, pull-request-available > > Looking at PrefixFilter, I have noticed that it doesn't use the > SEEK_NEXT_USING_HINT mechanism. > AFAICT, we could safely set the the prefix as a next row hint, which could be > a huge performance win. > Of course, ideally the user would set the scan startRow to the prefix, which > avoids the problem, but the user may forget to do that, or may use the filter > in a filterList that doesn't allow for setting the start/stop rows close tho > the prefix. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28900) Avoid resetting bucket cache during restart if inconsistency is observed for some blocks.
[ https://issues.apache.org/jira/browse/HBASE-28900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28900: --- Labels: pull-request-available (was: ) > Avoid resetting bucket cache during restart if inconsistency is observed for > some blocks. > - > > Key: HBASE-28900 > URL: https://issues.apache.org/jira/browse/HBASE-28900 > Project: HBase > Issue Type: Bug > Components: BucketCache >Reporter: Janardhan Hungund >Assignee: Janardhan Hungund >Priority: Major > Labels: pull-request-available > > During the execution of persistence of backing map into the persistence file, > the backing map is not guarded by a lock against caching of blocks and block > evictions. > Hence, some of the block entries in the backing map in persistence may not be > consistent with the bucket cache. > During, the retrieval of the backing map from persistence if an inconsistency > is detected, the complete bucket cache is discarded and is rebuilt. > One of the errors, seen, is, as mentioned below: > {code:java} > 2024-09-30 08:58:33,840 WARN > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache: Can't restore from > file[/hadoopfs/ephfs1/bucketcache.map]. The bucket cache will be reset and > rebuilt. Exception seen: > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocatorException: Couldn't > find match for index 26 in free list > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator$Bucket.addAllocation(BucketAllocator.java:140) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketAllocator.(BucketAllocator.java:406) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.retrieveFromFile(BucketCache.java:1486) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.lambda$startPersistenceRetriever$0(BucketCache.java:377) > at java.base/java.lang.Thread.run(Thread.java:840) {code} > This retrieval can be optimised to only discard the inconsistent entries in > the persistent backing map and retain the remaining entries. The bucket cache > validator will throw away the inconsistent entry from the backing map. > Thanks, > Janardhan -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28898) Use reflection to access recoverLease(), setSafeMode() APIs.
[ https://issues.apache.org/jira/browse/HBASE-28898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28898: --- Labels: pull-request-available (was: ) > Use reflection to access recoverLease(), setSafeMode() APIs. > > > Key: HBASE-28898 > URL: https://issues.apache.org/jira/browse/HBASE-28898 > Project: HBase > Issue Type: Task > Components: Filesystem Integration >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0 > > > HBASE-27769 used the new Hadoop API (available since Hadoop 3.3.6/3.4.0) to > access recoverLease() and setSafeMode() APIs, and committed the change in a > feature branch. > However, until we move beyond Hadoop 3.3.6, we can not use them directly. > I'd like to propose to use reflection to access these APIs in the interim so > HBase can use Ozone sooner. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28906) Run nightly tests with multiple Hadoop 3 versions
[ https://issues.apache.org/jira/browse/HBASE-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28906: --- Labels: pull-request-available (was: ) > Run nightly tests with multiple Hadoop 3 versions > - > > Key: HBASE-28906 > URL: https://issues.apache.org/jira/browse/HBASE-28906 > Project: HBase > Issue Type: Sub-task > Components: integration tests, test >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28901) checkcompatibility.py can run maven commands with parallelism
[ https://issues.apache.org/jira/browse/HBASE-28901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28901: --- Labels: pull-request-available (was: ) > checkcompatibility.py can run maven commands with parallelism > - > > Key: HBASE-28901 > URL: https://issues.apache.org/jira/browse/HBASE-28901 > Project: HBase > Issue Type: Task > Components: create-release >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > > We can speed up the create-release process by taking advantage of maven > parallelism during creation of the API compatibility report. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28903) Incremental backup test missing explicit test for bulkloads
[ https://issues.apache.org/jira/browse/HBASE-28903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28903: --- Labels: pull-request-available (was: ) > Incremental backup test missing explicit test for bulkloads > --- > > Key: HBASE-28903 > URL: https://issues.apache.org/jira/browse/HBASE-28903 > Project: HBase > Issue Type: Improvement >Reporter: Hernan Gelaf-Romer >Priority: Major > Labels: pull-request-available > > Our incremental backup tests don't explicitly test our ability to backup and > restore bulkloads. It'd be nice to have this to verify bulkloads work in the > context of the backup/restore flow, and to avoid regressions in the future -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28905) Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular expressions
[ https://issues.apache.org/jira/browse/HBASE-28905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28905: --- Labels: pull-request-available (was: ) > Skip excessive evaluations of LINK_NAME_PATTERN and REF_NAME_PATTERN regular > expressions > > > Key: HBASE-28905 > URL: https://issues.apache.org/jira/browse/HBASE-28905 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 3.0.0-beta-1, 2.7.0 >Reporter: Charles Connell >Assignee: Charles Connell >Priority: Minor > Labels: pull-request-available > Attachments: cpu_time_flamegraph_2.6.0.html, > cpu_time_flamegraph_with_optimization.html, > performance_test_query_latency_2.6.0.png, > performance_test_query_latency_with_optimization.png > > > To test if a file is a link file, HBase checks if its file name matches the > regex > {code:java} > ^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$ > {code} > To test if an HFile has a "reference name," HBase checks if its file name > matches the regex > {code:java} > ^([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?|^(?:((?:[_\p{Digit}\p{IsAlphabetic}]+))(?:\=))?((?:[_\p{Digit}\p{IsAlphabetic}][-_.\p{Digit}\p{IsAlphabetic}]*))=((?:[a-f0-9]+))-([0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?)$)\.(.+)$ > {code} > Matching against these big regexes is computationally expensive. HBASE-27474 > introduced (in 2.6.0) [code in a hot > path|https://github.com/apache/hbase/blob/1602c531b245b4d455b48161757cde2ec3d1930b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java#L1716] > in {{HFileReaderImpl}} that checks whether an HFile is a link or reference > file while deciding whether to cache blocks from that file. In flamegraphs > taken at my company during performance tests, this meant that these regex > evaulations take 2-3% of the CPU time on a busy RegionServer. > Later, the hot-path invocation of the regexes was removed in HBASE-28596 in > branch-2 and later, but not branch-2.6, so only the 2.6.x series suffers the > performance regression. Nonetheless, all invocations of these regexes are > still unnecessarily expensive and can be fast-failed easily. > The link name pattern contains a literal "=", so any string that does not > contain a "=" can be assumed to not match the regex. The reference name > pattern contains a literal ".", so any string that does not contain a "." can > be assumed to not match the regex. This optimization is mostly helpful in > 2.6.x, but is valid in all branches. > Running performance tests of this optimization removed the regex evaluations > from my flamegraphs entirely, and reduced query latency by 15%. Some charts > are attached. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28897) Incremental backups can be taken with incompatible column families
[ https://issues.apache.org/jira/browse/HBASE-28897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28897: --- Labels: pull-request-available (was: ) > Incremental backups can be taken with incompatible column families > -- > > Key: HBASE-28897 > URL: https://issues.apache.org/jira/browse/HBASE-28897 > Project: HBase > Issue Type: Bug > Components: backup&restore >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1 >Reporter: Hernan Gelaf-Romer >Assignee: Hernan Gelaf-Romer >Priority: Major > Labels: pull-request-available > > Incremental backups can be taken even if the table descriptor of the current > table does not match the column families of the full backup for that same > table. When restoring the table, we choose to use the families of the full > backup. This can cause the restore process to fail if we add a column family > in the incremental backup that doesn't exist in the full backup. The bulkload > process will fail because it is trying to write column families that don't > exist in the restore table. > > I think the correct solution here is to prevent incremental backups from > being taken if the families of the current table don't match those of the > full backup. This will force users to instead take a full backup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28894) NPE on TestPrefetch.testPrefetchWithDelay
[ https://issues.apache.org/jira/browse/HBASE-28894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28894: --- Labels: pull-request-available (was: ) > NPE on TestPrefetch.testPrefetchWithDelay > - > > Key: HBASE-28894 > URL: https://issues.apache.org/jira/browse/HBASE-28894 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > I'm seeing some failures on TestPrefetch.testPrefetchWithDelay in some > pre-commit runs, I believe this is due to a race condition in > PrefetchExecutor. > loadConfiguration. > In these failures, it seems we are getting the NPE below: > {noformat} > Stacktracejava.lang.NullPointerException > at > java.util.concurrent.ConcurrentSkipListMap.put(ConcurrentSkipListMap.java:1580) > at > org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.request(PrefetchExecutor.java:108) > at > org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.lambda$loadConfiguration$0(PrefetchExecutor.java:206) > at > java.util.concurrent.ConcurrentSkipListMap.forEach(ConcurrentSkipListMap.java:3269) > at > org.apache.hadoop.hbase.io.hfile.PrefetchExecutor.loadConfiguration(PrefetchExecutor.java:200) > at > org.apache.hadoop.hbase.regionserver.PrefetchExecutorNotifier.onConfigurationChange(PrefetchExecutorNotifier.java:51) > at > org.apache.hadoop.hbase.io.hfile.TestPrefetch.testPrefetchWithDelay(TestPrefetch.java:378) > {noformat} > I think this is because we are completing prefetch in this test before the > induced delay, then this test triggers a new configuration change, but the > prefetch thread calls PrefetchExecutor.complete just before the test thread > reaches [this > point|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/PrefetchExecutor.java#L206]: > {noformat} > 2024-10-01T11:28:10,660 DEBUG [Time-limited test {}] > hfile.PrefetchExecutor(102): Prefetch requested for > /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0, > delay=25000 ms > 2024-10-01T11:28:30,668 INFO [Time-limited test {}] hbase.Waiter(181): > Waiting up to [10,000] milli-secs(wait.for.ratio=[1]) > 2024-10-01T11:28:35,661 DEBUG [hfile-prefetch-1727782088576 {}] > hfile.HFilePreadReader$1(103): No entry in the backing map for cache key > 71eefdb271ae4f65b694a6ec3d4287a0_0. > ... > 2024-10-01T11:28:35,673 DEBUG [hfile-prefetch-1727782088576 {}] > hfile.HFilePreadReader$1(103): No entry in the backing map for cache key > 71eefdb271ae4f65b694a6ec3d4287a0_52849. > 2024-10-01T11:28:35,674 DEBUG [Time-limited test {}] > hfile.PrefetchExecutor(142): Prefetch cancelled for > /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0 > 2024-10-01T11:28:35,674 DEBUG [hfile-prefetch-1727782088576 {}] > hfile.PrefetchExecutor(121): Prefetch completed for > 71eefdb271ae4f65b694a6ec3d4287a0 > 2024-10-01T11:28:35,674 DEBUG [Time-limited test {}] > hfile.PrefetchExecutor(102): Prefetch requested for > /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-6328/yetus-jdk8-hadoop2-check/src/hbase-server/target/test-data/b646497b-7616-6533-e8cb-98e5c9e2e083/TestPrefetchWithDelay/71eefdb271ae4f65b694a6ec3d4287a0, > delay=991 ms > ... > {noformat} > CC: [~kabhishek4] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28890) RefCnt Leak error when caching index blocks at write time
[ https://issues.apache.org/jira/browse/HBASE-28890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28890: --- Labels: pull-request-available (was: ) > RefCnt Leak error when caching index blocks at write time > - > > Key: HBASE-28890 > URL: https://issues.apache.org/jira/browse/HBASE-28890 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > Following [~bbeaudreault] works from HBASE-27170 that added the (very useful) > refcount leak detector, we sometimes see these reports on some branch-2 based > deployments: > {noformat} > 2024-09-25 10:06:42,413 ERROR > org.apache.hbase.thirdparty.io.netty.util.ResourceLeakDetector: LEAK: > RefCnt.release() was not called before it's garbage-collected. See > https://netty.io/wiki/reference-counted-objects.html for more information. > Recent access records: > Created at: > org.apache.hadoop.hbase.nio.RefCnt.(RefCnt.java:59) > org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:54) > org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:550) > > org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:357) > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.cloneUncompressedBufferWithHeader(HFileBlock.java:1153) > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.getBlockForCaching(HFileBlock.java:1215) > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.lambda$writeIndexBlocks$0(HFileBlockIndex.java:997) > java.base/java.util.Optional.ifPresent(Optional.java:178) > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:996) > > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:635) > > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:378) > > org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69) > > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) > > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:831) > > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2033) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2878) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2620) > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2592) > > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2462) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:602) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:572) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:65) > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:344) > {noformat} > It turns out that we always convert the block to a "on-heap" one, inside > LruBlockCache.cacheBlock, so when the index block is a SharedMemHFileBlock, > the blockForCaching instance in the code > [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java#L1076] > becomes eligible for GC without releasing buffers/decreasing refcount > (leak), right after we return the BlockIndexWriter.writeIndexBlocks call. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28733) Publish API docs for 2.6
[ https://issues.apache.org/jira/browse/HBASE-28733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28733: --- Labels: pull-request-available (was: ) > Publish API docs for 2.6 > > > Key: HBASE-28733 > URL: https://issues.apache.org/jira/browse/HBASE-28733 > Project: HBase > Issue Type: Task > Components: community, documentation >Reporter: Nick Dimiduk >Assignee: Dávid Paksy >Priority: Major > Labels: pull-request-available > > We have released 2.6 but the website has not been updated with the new API > docs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28888) Backport "HBASE-18382 [Thrift] Add transport type info to info server" to branch-2
[ https://issues.apache.org/jira/browse/HBASE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-2: --- Labels: beginner pull-request-available (was: beginner) > Backport "HBASE-18382 [Thrift] Add transport type info to info server" to > branch-2 > -- > > Key: HBASE-2 > URL: https://issues.apache.org/jira/browse/HBASE-2 > Project: HBase > Issue Type: Improvement > Components: Thrift >Reporter: Lars George >Assignee: Nihal Jain >Priority: Minor > Labels: beginner, pull-request-available > Fix For: 3.0.0-alpha-1 > > > It would be really helpful to know if the Thrift server was started using the > HTTP or binary transport. Any additional info, like QOP settings for SASL > etc. would be great too. Right now the UI is very limited and shows > {{true/false}} for, for example, {{Compact Transport}}. It'd suggest to > change this to show something more useful like this: > {noformat} > Thrift Impl Type: non-blocking > Protocol: Binary > Transport: Framed > QOP: Authentication & Confidential > {noformat} > or > {noformat} > Protocol: Binary + HTTP > Transport: Standard > QOP: none > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28887) Fix broken link to mailing lists page
[ https://issues.apache.org/jira/browse/HBASE-28887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28887: --- Labels: pull-request-available (was: ) > Fix broken link to mailing lists page > - > > Key: HBASE-28887 > URL: https://issues.apache.org/jira/browse/HBASE-28887 > Project: HBase > Issue Type: Task > Components: documentation >Affects Versions: 4.0.0-alpha-1 >Reporter: Dávid Paksy >Priority: Minor > Labels: pull-request-available > > The Reference Guide (book) link to the mailing lists page -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28866) Setting `hbase.oldwals.cleaner.thread.size` to negative value will break HMaster and produce hard-to-diagnose logs
[ https://issues.apache.org/jira/browse/HBASE-28866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28866: --- Labels: pull-request-available (was: ) > Setting `hbase.oldwals.cleaner.thread.size` to negative value will break > HMaster and produce hard-to-diagnose logs > -- > > Key: HBASE-28866 > URL: https://issues.apache.org/jira/browse/HBASE-28866 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.4.2, 3.0.0-beta-1 >Reporter: Ariadne_team >Priority: Critical > Labels: pull-request-available > Fix For: 3.0.0-beta-2 > > Attachments: HBASE-28866-000-1.patch, HBASE-28866-000.patch > > > > Problem > - > HBase Master cannot be initialized with the following setting: > > hbase.oldwals.cleaner.thread.size > -1 > Default is 2 > > > After running the start-hbase.sh, the Master node could not be started due to > an exception: > {code:java} > ERROR [master/localhost:16000:becomeActiveMaster] master.HMaster: Failed to > become active master > java.lang.IllegalArgumentException: Illegal Capacity: -1 > at java.util.ArrayList.(ArrayList.java:157) > at > org.apache.hadoop.hbase.master.cleaner.LogCleaner.createOldWalsCleaner(LogCleaner.java:149) > at > org.apache.hadoop.hbase.master.cleaner.LogCleaner.(LogCleaner.java:80) > at > org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1329) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:917) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2081) > at org.apache.hadoop.hbase.master.HMaster.lambda$0(HMaster.java:505) > at java.lang.Thread.run(Thread.java:750){code} > We were really confused and misled by the error log as the 'Illegal Capacity' > of ArrayList seems like an internal code issue. > > After we read the source code, we found that > "hbase.oldwals.cleaner.thread.size" is parsed and used in > createOldWalsCleaner() function without checking: > {code:java} > int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE, > DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE); this.oldWALsCleaner = > createOldWalsCleaner(size); {code} > The value of "hbase.oldwals.cleaner.thread.size" will be served as the > initialCapacity of ArrayList. If the configuration value is negative, an > IllegalArgumentException will be thrown.: > {code:java} > private List createOldWalsCleaner(int size) { > ... > List oldWALsCleaner = new ArrayList<>(size); > ... > } {code} > > Solution (the attached patch) > - > The basic idea of the attached patch is to add a check and relevant logging > for this value during the initialization of the {{LogCleaner}} in the > constructor. This will help users better diagnose the issue. The detailed > patch is shown below. > {code:java} > @@ -78,6 +78,11 @@ > public class LogCleaner extends CleanerChore > pool, params, null); > this.pendingDelete = new LinkedBlockingQueue<>(); > int size = conf.getInt(OLD_WALS_CLEANER_THREAD_SIZE, > DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE); > + if (size <= 0) { > + LOG.warn("The size of old WALs cleaner thread is {}, which is invalid, > " > + + "the default value will be used.", size); > + size = DEFAULT_OLD_WALS_CLEANER_THREAD_SIZE; > + } > this.oldWALsCleaner = createOldWalsCleaner(size); > this.cleanerThreadTimeoutMsec = > conf.getLong(OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC, > DEFAULT_OLD_WALS_CLEANER_THREAD_TIMEOUT_MSEC);{code} > Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28884) SFT's BrokenStoreFileCleaner may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-28884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28884: --- Labels: pull-request-available (was: ) > SFT's BrokenStoreFileCleaner may cause data loss > > > Key: HBASE-28884 > URL: https://issues.apache.org/jira/browse/HBASE-28884 > Project: HBase > Issue Type: Bug >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Labels: pull-request-available > > When having this BrokenStoreFileCleaner enabled, one of our customers has run > into a data loss situation, probably due to a race condition between regions > getting moved out of the regionserver while the BrokenStoreFileCleaner was > checking this region's files eligibility for deletion. We have seen that the > file got deleted by the given region server, around the same time the region > got closed on this region server. I believe a race condition during region > close is possible here: > 1) In BrokenStoreFileCleaner, for each region online on the given RS, we get > the list of files in the store dirs, then iterate through it [1]; > 2) For each file listed, we perform several checks, including this one [2] > that checks if the file is "active" > The problem is, if the region for the file we are checking got closed between > point #1 and #2, by the time we check if the file is active in [2], the store > may have already been closed as part of the region closure, so this check > would consider the file as deletable. > One simple solution is to check if the store's region is still open before > proceeding with deleting the file. > [1] > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java#L99 > [2] > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BrokenStoreFileCleaner.java#L133 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28883) Manage hbase-thirdparty transitive dependencies via BOM pom
[ https://issues.apache.org/jira/browse/HBASE-28883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28883: --- Labels: pull-request-available (was: ) > Manage hbase-thirdparty transitive dependencies via BOM pom > --- > > Key: HBASE-28883 > URL: https://issues.apache.org/jira/browse/HBASE-28883 > Project: HBase > Issue Type: Task > Components: build, thirdparty >Reporter: Nick Dimiduk >Priority: Major > Labels: pull-request-available > > Despite the intentions to the contrary, there are several places where we > need the version of a dependency managed in hbase-thirdparty to match an > import in the main product (and maybe also in our other repos). Right now, > this is managed via comments in the poms, which read "when this changes > there, don't for get to update it here...". We can do better than this. > I think that hbase-thirdparty could publish a BOM pom file that can be > imported into any of the downstream hbase projects that make use of that > release of hbase-thirdparty. That will centralize management of these > dependencies in the hbase-thirdparty repo. > This blog post has a nice write-up on the idea, > https://www.garretwilson.com/blog/2023/06/14/improve-maven-bom-pattern -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28879) Bump hbase-thirdparty to 4.1.9
[ https://issues.apache.org/jira/browse/HBASE-28879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28879: --- Labels: pull-request-available (was: ) > Bump hbase-thirdparty to 4.1.9 > -- > > Key: HBASE-28879 > URL: https://issues.apache.org/jira/browse/HBASE-28879 > Project: HBase > Issue Type: Task > Components: dependencies, thirdparty >Reporter: Duo Zhang >Assignee: Nick Dimiduk >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28882) Backup restores are broken if the backup has moved locations
[ https://issues.apache.org/jira/browse/HBASE-28882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28882: --- Labels: pull-request-available (was: ) > Backup restores are broken if the backup has moved locations > > > Key: HBASE-28882 > URL: https://issues.apache.org/jira/browse/HBASE-28882 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0, 2.6.1 >Reporter: Ray Mattingly >Assignee: Ray Mattingly >Priority: Major > Labels: pull-request-available > > My company runs a few hundred HBase clusters. We want to take backups > everyday in one public cloud region, and then use said cloud's native > replication solution to "backup our backups" in a secondary region. This is > how we plan for region-wide disaster recovery. > This system should work, but doesn't because of the way that BackupManifests > are constructed. > Backing up a bit (no pun intended): when we replicate backups verbatim, the > manifest file continues to point to the original backup root. This shouldn't > matter, because when taking a restore one passes a RestoreRequest to the > RestoreTablesClient — and this RestoreRequest includes a BackupRootDir field. > This works as you would expect initially, but eventually we build a > BackupManifest that fails to interpolate this provided root directory and, > instead, falls back to what it finds on disk in the backup (which would point > back to the primary backup location, even if reading a replicated backup). > To fix this, I'm proposing that we properly interpolate the request's root > directory field when building BackupManifests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28880) ParseException may occur when getting the fileDate of the mob file recovered through snapshot
[ https://issues.apache.org/jira/browse/HBASE-28880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28880: --- Labels: pull-request-available (was: ) > ParseException may occur when getting the fileDate of the mob file recovered > through snapshot > - > > Key: HBASE-28880 > URL: https://issues.apache.org/jira/browse/HBASE-28880 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.4.13 > Environment: hbase2.4.13 > centos >Reporter: guluo >Assignee: guluo >Priority: Major > Labels: pull-request-available > > The task ExpiredMobFileCleaner may occur ParseException when parsing MOB file > recovered through snapshot, causing these expired MOB file cannot be deleted. > > The Reason: > The task ExpiredMobFileCleaner obtain the MOB file creation time by parsing > the MOB filename. > In regular MOB table, the 32nd to 40th characters of the MOB filename > indicate the file creation time, ExpiredMobFileCleaner can get the creation > time of MOB file by obtaining these characters. > However, in MOB tables recovered through snapshot, the format of MOB filename > is tableName-mobregionaname-hfilename,so ExpiredMobFileCleaner may not be > able to obtain the creation time of MOB file by obtaining the characters at > the above location. So, in this situation, ParseException will occur, causing > these expired MOB file cannot be deleted finally. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-26867) Introduce a FlushProcedure
[ https://issues.apache.org/jira/browse/HBASE-26867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-26867: --- Labels: pull-request-available (was: ) > Introduce a FlushProcedure > -- > > Key: HBASE-26867 > URL: https://issues.apache.org/jira/browse/HBASE-26867 > Project: HBase > Issue Type: New Feature > Components: proc-v2 >Reporter: ruanhui >Assignee: ruanhui >Priority: Minor > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-beta-1 > > > Reimplement proc-v1 based flush procedure in proc-v2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-27757) Clean up ScanMetrics API
[ https://issues.apache.org/jira/browse/HBASE-27757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-27757: --- Labels: pull-request-available (was: ) > Clean up ScanMetrics API > > > Key: HBASE-27757 > URL: https://issues.apache.org/jira/browse/HBASE-27757 > Project: HBase > Issue Type: Improvement >Reporter: Bryan Beaudreault >Assignee: Chandra Sekhar K >Priority: Major > Labels: pull-request-available > > ScanMetrics object exposes public instance variables for all metrics. For > example, ScanMetrics.countOfRPCcalls. This is not standard API design in java > or in hbase, and requires suppressing VisibilityModifier checkstyle warnings. > We should clean this up, but would require major version changes since it's > part of public API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28382) Support building hbase-connectors with JDK17
[ https://issues.apache.org/jira/browse/HBASE-28382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28382: --- Labels: pull-request-available (was: ) > Support building hbase-connectors with JDK17 > > > Key: HBASE-28382 > URL: https://issues.apache.org/jira/browse/HBASE-28382 > Project: HBase > Issue Type: Sub-task > Components: hbase-connectors, java >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28790) hbase-connectors fails to build with hbase 2.6.0
[ https://issues.apache.org/jira/browse/HBASE-28790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28790: --- Labels: pull-request-available (was: ) > hbase-connectors fails to build with hbase 2.6.0 > > > Key: HBASE-28790 > URL: https://issues.apache.org/jira/browse/HBASE-28790 > Project: HBase > Issue Type: Bug > Components: build, hbase-connectors >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > hbase-connectors fails to build with hbase 2.6.0 > {code:java} > [INFO] Reactor Summary for Apache HBase Connectors 1.1.0-SNAPSHOT: > [INFO] > [INFO] Apache HBase Connectors SUCCESS [ 4.377 > s] > [INFO] Apache HBase - Kafka ... SUCCESS [ 0.116 > s] > [INFO] Apache HBase - Model Objects for Kafka Proxy ... SUCCESS [ 3.222 > s] > [INFO] Apache HBase - Kafka Proxy . FAILURE [ 8.305 > s] > [INFO] Apache HBase - Spark ... SKIPPED > [INFO] Apache HBase - Spark Protocol .. SKIPPED > [INFO] Apache HBase - Spark Protocol (Shaded) . SKIPPED > [INFO] Apache HBase - Spark Connector . SKIPPED > [INFO] Apache HBase - Spark Integration Tests . SKIPPED > [INFO] Apache HBase Connectors - Assembly . SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 16.703 s > [INFO] Finished at: 2024-08-17T11:29:20Z > [INFO] > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile > (default-compile) on project hbase-kafka-proxy: Compilation failure > [ERROR] > /workspaces/hbase-connectors/kafka/hbase-kafka-proxy/src/main/java/org/apache/hadoop/hbase/kafka/KafkaBridgeConnection.java:[169,31] > is not > abstract and does not override abstract method > setRequestAttribute(java.lang.String,byte[]) in > org.apache.hadoop.hbase.client.TableBuilder {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28876) Should call ProcedureSchduler.completionCleanup for non-root procedure too
[ https://issues.apache.org/jira/browse/HBASE-28876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28876: --- Labels: pull-request-available (was: ) > Should call ProcedureSchduler.completionCleanup for non-root procedure too > -- > > Key: HBASE-28876 > URL: https://issues.apache.org/jira/browse/HBASE-28876 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > Per the discussion in this PR > https://github.com/apache/hbase/pull/6247 > And the related issue HBASE-28830, it seems incorrect that we only call > cleanup for non-root procedures. > This issue aims to see if there are any issues we call this method for every > procedure. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28875) FSHlog closewrite closeErrorCount should increment for initial catch exception
[ https://issues.apache.org/jira/browse/HBASE-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28875: --- Labels: pull-request-available (was: ) > FSHlog closewrite closeErrorCount should increment for initial catch exception > -- > > Key: HBASE-28875 > URL: https://issues.apache.org/jira/browse/HBASE-28875 > Project: HBase > Issue Type: Bug > Components: wal >Affects Versions: 2.7.0 >Reporter: Y. SREENIVASULU REDDY >Assignee: Y. SREENIVASULU REDDY >Priority: Minor > Labels: pull-request-available > > During close writer for FSHlog, if any error occured then for initial > exception itself need to increment the closeErrorCount counter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28868) Add missing permission check for updateRSGroupConfig in branch-2
[ https://issues.apache.org/jira/browse/HBASE-28868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28868: --- Labels: pull-request-available (was: ) > Add missing permission check for updateRSGroupConfig in branch-2 > > > Key: HBASE-28868 > URL: https://issues.apache.org/jira/browse/HBASE-28868 > Project: HBase > Issue Type: Task > Components: rsgroup >Affects Versions: 2.7.0 >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Minor > Labels: pull-request-available > > Found this during HBASE-28867, we do not have security check for > updateRSGroupConfig in branch-2. See > [https://github.com/apache/hbase/blob/0dc334f572329be7eb2455cec3519fc820c04c25/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminEndpoint.java#L450] > Same check exists in master > [https://github.com/apache/hbase/blob/52082bc5b80a60406bfaaa630ed5cb23027436c1/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java#L2279] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28871) [hbase-thirdparty] Bump dependency versions before releasing
[ https://issues.apache.org/jira/browse/HBASE-28871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28871: --- Labels: pull-request-available (was: ) > [hbase-thirdparty] Bump dependency versions before releasing > > > Key: HBASE-28871 > URL: https://issues.apache.org/jira/browse/HBASE-28871 > Project: HBase > Issue Type: Sub-task > Components: dependencies, thirdparty >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28869) [hbase-thirdparty] Bump protobuf java to 4.27.5+
[ https://issues.apache.org/jira/browse/HBASE-28869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28869: --- Labels: pull-request-available (was: ) > [hbase-thirdparty] Bump protobuf java to 4.27.5+ > > > Key: HBASE-28869 > URL: https://issues.apache.org/jira/browse/HBASE-28869 > Project: HBase > Issue Type: Task > Components: Protobufs, security, thirdparty >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > Fix For: thirdparty-4.1.9 > > > For addressing CVE-2024-7254 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28867) Backport "HBASE-20653 Add missing observer hooks for region server group to MasterObserver" to branch-2
[ https://issues.apache.org/jira/browse/HBASE-28867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28867: --- Labels: pull-request-available (was: ) > Backport "HBASE-20653 Add missing observer hooks for region server group to > MasterObserver" to branch-2 > --- > > Key: HBASE-28867 > URL: https://issues.apache.org/jira/browse/HBASE-28867 > Project: HBase > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.0, 2.5.10 >Reporter: Ted Yu >Assignee: Nihal Jain >Priority: Major > Labels: pull-request-available > > Currently the following region server group operations don't have > corresponding hook in MasterObserver : > * getRSGroupInfo > * getRSGroupInfoOfServer > * getRSGroupInfoOfTable > * listRSGroup > This JIRA is to > * add them to MasterObserver > * add pre/post hook calls in RSGroupAdminEndpoint thru > master.getMasterCoprocessorHost for the above operations > * add corresponding tests to TestRSGroups (in similar manner to that of > HBASE-20627) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28864) NoMethodError undefined method assignment_expression?
[ https://issues.apache.org/jira/browse/HBASE-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28864: --- Labels: pull-request-available (was: ) > NoMethodError undefined method assignment_expression? > - > > Key: HBASE-28864 > URL: https://issues.apache.org/jira/browse/HBASE-28864 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2 > > > After HBASE-28250 Bump jruby to 9.4.8.0 to fix snakeyaml CVE after every > command the message "NoMethodError undefined method assignment_expression?" > is printed. > This is called from code copied from > https://github.com/ruby/irb/blob/v1.4.2/lib/irb.rb . The fix is to also copy > over the definition of `assignment_expression`. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28835) Make connector support for Decimal type
[ https://issues.apache.org/jira/browse/HBASE-28835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28835: --- Labels: pull-request-available (was: ) > Make connector support for Decimal type > --- > > Key: HBASE-28835 > URL: https://issues.apache.org/jira/browse/HBASE-28835 > Project: HBase > Issue Type: Improvement > Components: spark >Affects Versions: connector-1.0.0 >Reporter: yan.duan >Priority: Minor > Labels: pull-request-available > Fix For: connector-1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28862) Change the generic type for ObserverContext from 'RegionCoprocessorEnvironment' to '? extends RegionCoprocessorEnvironment' in RegionObserver
[ https://issues.apache.org/jira/browse/HBASE-28862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28862: --- Labels: pull-request-available (was: ) > Change the generic type for ObserverContext from > 'RegionCoprocessorEnvironment' to '? extends RegionCoprocessorEnvironment' in > RegionObserver > - > > Key: HBASE-28862 > URL: https://issues.apache.org/jira/browse/HBASE-28862 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors, regionserver >Reporter: Duo Zhang >Priority: Major > Labels: pull-request-available > > This will be a breaking change for coprocessor implementation, but the > ability of region observer is not changed, so I think it is OK to include > this in 3.0.0 release, as we have already changed the coprocessor protobuf to > the relocated one, which already breaks lots of coprocessors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28721) AsyncFSWAL is broken when running against hadoop 3.4.0
[ https://issues.apache.org/jira/browse/HBASE-28721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28721: --- Labels: pull-request-available (was: ) > AsyncFSWAL is broken when running against hadoop 3.4.0 > -- > > Key: HBASE-28721 > URL: https://issues.apache.org/jira/browse/HBASE-28721 > Project: HBase > Issue Type: Bug > Components: hadoop3, wal >Reporter: Duo Zhang >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > {noformat} > 2024-07-10T10:09:33,161 ERROR [master/localhost:0:becomeActiveMaster {}] > asyncfs.FanOutOneBlockAsyncDFSOutputHelper(258): Couldn't properly initialize > access to HDFS internals. Please update your WAL Provider to not make use of > the 'asyncfs' provider. See HBASE-16110 for more information. > java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.DFSClient.beginFileLease(long,org.apache.hadoop.hdfs.DFSOutputStream) > at java.lang.Class.getDeclaredMethod(Class.java:2675) ~[?:?] > at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createLeaseManager(FanOutOneBlockAsyncDFSOutputHelper.java:175) > ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.(FanOutOneBlockAsyncDFSOutputHelper.java:252) > ~[hbase-asyncfs-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at java.lang.Class.forName0(Native Method) ~[?:?] > at java.lang.Class.forName(Class.java:375) ~[?:?] > at > org.apache.hadoop.hbase.wal.AsyncFSWALProvider.load(AsyncFSWALProvider.java:149) > ~[classes/:?] > at > org.apache.hadoop.hbase.wal.WALFactory.getProviderClass(WALFactory.java:174) > ~[classes/:?] > at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:262) > ~[classes/:?] > at org.apache.hadoop.hbase.wal.WALFactory.(WALFactory.java:231) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:383) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) > ~[classes/:?] > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) > ~[classes/:?] > at > org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155) > ~[hbase-common-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] > at java.lang.Thread.run(Thread.java:840) ~[?:?] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28569) Race condition during WAL splitting leading to corrupt recovered.edits
[ https://issues.apache.org/jira/browse/HBASE-28569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28569: --- Labels: pull-request-available (was: ) > Race condition during WAL splitting leading to corrupt recovered.edits > -- > > Key: HBASE-28569 > URL: https://issues.apache.org/jira/browse/HBASE-28569 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 2.4.17 >Reporter: Benoit Sigoure >Priority: Major > Labels: pull-request-available > > There is a race condition that can happen when a regionserver aborts > initialisation while splitting a WAL from another regionserver. This race > leads to writing the WAL trailer for recovered edits while the writer threads > are still running, thus the trailer gets interleaved with the edits > corrupting the recovered edits file (and preventing the region to be > assigned). > We've seen this happening on HBase 2.4.17, but looking at the latest code it > seems that the race can still happen there. > The sequence of operations that leads to this issue: > * {{org.apache.hadoop.hbase.wal.WALSplitter.splitWAL}} calls > {{outputSink.close()}} after adding all the entries to the buffers > * The output sink is > {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink}} and its {{close}} > method calls first {{finishWriterThreads}} in a try block which in turn will > call {{finish}} on every thread and then join it to make sure it's done. > * However if the splitter thread gets interrupted because of RS aborting, > the join will get interrupted and {{finishWriterThreads}} will rethrow > without waiting for the writer threads to stop. > * This is problematic because coming back to > {{org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink.close}} it will call > {{closeWriters}} in a finally block (so it will execute even when the join > was interrupted). > * {{closeWriters}} will call > {{org.apache.hadoop.hbase.wal.AbstractRecoveredEditsOutputSink.closeRecoveredEditsWriter}} > which will call {{close}} on {{{}editWriter.writer{}}}. > * When {{editWriter.writer}} is > {{{}org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter{}}}, its > {{close}} method will write the trailer before closing the file. > * This trailer write will now go in parallel with writer threads writing > entries causing corruption. > * If there are no other errors, {{closeWriters}} will succeed renaming all > temporary files to final recovered edits, causing problems next time the > region is assigned. > Logs evidence supporting the above flow: > Abort is triggered (because it failed to open the WAL due to some ongoing > infra issue): > {noformat} > regionserver-2 regionserver 06:22:00.384 > [RS_OPEN_META-regionserver/host01:16201-0] ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer - * ABORTING region > server host01,16201,1709187641249: WAL can not clean up after init failed > *{noformat} > We can see that the writer threads were still active after closing (even > considering that the > ordering in the log might not be accurate, we see that they die because the > channel is closed while still writing, not because they're stopping): > {noformat} > regionserver-2 regionserver 06:22:09.662 [DataStreamer for file > /hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/03169600719-host02%2C16201%2C1709180140645.1709186722780.temp > block BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368] WARN > org.apache.hadoop.hdfs.DataStreamer - Error Recovery for > BP-1645452845-192.168.2.230-1615455682886:blk_1076340939_2645368 in pipeline > [DatanodeInfoWithStorage[192.168.2.230:15010,DS-2aa201ab-1027-47ec-b05f-b39d795fda85,DISK], > > DatanodeInfoWithStorage[192.168.2.232:15010,DS-39651d5a-67d2-4126-88f0-45cdee967dab,DISK], > Datanode > InfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]]: > datanode > 2(DatanodeInfoWithStorage[192.168.2.231:15010,DS-e08a1d17-f7b1-4e39-9713-9706bd762f48,DISK]) > is bad. > regionserver-2 regionserver 06:22:09.742 [split-log-closeStream-pool-1] INFO > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Closed recovered edits > writer > path=hdfs://mycluster/hbase/data/default/aeris_v2/53308260a6b22eaf6ebb8353f7df3077/recovered.edits/03169600719-host02%2C16201% > 2C1709180140645.1709186722780.temp (wrote 5949 edits, skipped 0 edits in 93 > ms) > regionserver-2 regionserver 06:22:09.743 > [RS_LOG_REPLAY_OPS-regionserver/host01:16201-1-Writer-0] ERROR > org.apache.hadoop.hbase.wal.RecoveredEditsOutputSink - Failed to write log > entry aeris_v2/53308260a6b22eaf6ebb8353f7df3077/3169611655=[#edits: 8 = > ] to log > regionserver-2 regionserver
[jira] [Updated] (HBASE-28797) New version of Region#getRowLock with timeout
[ https://issues.apache.org/jira/browse/HBASE-28797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28797: --- Labels: pull-request-available (was: ) > New version of Region#getRowLock with timeout > - > > Key: HBASE-28797 > URL: https://issues.apache.org/jira/browse/HBASE-28797 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.6.0, 3.0.0-beta-1 >Reporter: Viraj Jasani >Assignee: Chandra Sekhar K >Priority: Major > Labels: pull-request-available > > Region APIs are LimitedPrivate for Coprocs. One of the APIs provided by HBase > for Coproc use is to acquire row level read/write lock(s): > {code:java} > /** > * Get a row lock for the specified row. All locks are reentrant. Before > calling this function > * make sure that a region operation has already been started (the calling > thread has already > * acquired the region-close-guard lock). > * > * The obtained locks should be released after use by {@link > RowLock#release()} > * > * NOTE: the boolean passed here has changed. It used to be a boolean that > stated whether or not > * to wait on the lock. Now it is whether it an exclusive lock is requested. > * @param row The row actions will be performed against > * @param readLock is the lock reader or writer. True indicates that a > non-exclusive lock is > * requested > * @see #startRegionOperation() > * @see #startRegionOperation(Operation) > */ > RowLock getRowLock(byte[] row, boolean readLock) throws IOException; {code} > The implementation by default uses config "hbase.rowlock.wait.duration" as > row level lock timeout for both read and write locks. The default value is > quite high (~30s). > While updating the cluster level row lock timeout might not be worth for all > use cases, having new API that takes timeout param would be really helpful > for critical latency sensitive Coproc APIs. > > The new signature should be: > {code:java} > RowLock getRowLock(byte[] row, boolean readLock, int timeout) throws > IOException; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28850) Only return from ReplicationSink.replicationEntries while all background tasks are finished
[ https://issues.apache.org/jira/browse/HBASE-28850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28850: --- Labels: pull-request-available (was: ) > Only return from ReplicationSink.replicationEntries while all background > tasks are finished > --- > > Key: HBASE-28850 > URL: https://issues.apache.org/jira/browse/HBASE-28850 > Project: HBase > Issue Type: Improvement > Components: Replication, rpc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28645) Add build information to the REST server version endpoint
[ https://issues.apache.org/jira/browse/HBASE-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28645: --- Labels: pull-request-available (was: ) > Add build information to the REST server version endpoint > - > > Key: HBASE-28645 > URL: https://issues.apache.org/jira/browse/HBASE-28645 > Project: HBase > Issue Type: New Feature > Components: REST >Reporter: Istvan Toth >Priority: Minor > Labels: pull-request-available > > There is currently no way to check the REST server version / build number > remotely. > The */version/cluster* endpoint takes the version from master (fair enough), > and the */version/rest* does not include the build information. > We should add a version field to the /version/rest endpoint, which reports > the version of the REST server component. > We should also log this at startup, just like we log the cluster version now. > We may have to add and store the version in the hbase-rest code during build, > similarly to how do it for the other components. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28846) Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase works with earlier supported Hadoop versions
[ https://issues.apache.org/jira/browse/HBASE-28846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28846: --- Labels: pull-request-available (was: ) > Change the default Hadoop3 version to 3.4.0, and add tests to make sure HBase > works with earlier supported Hadoop versions > -- > > Key: HBASE-28846 > URL: https://issues.apache.org/jira/browse/HBASE-28846 > Project: HBase > Issue Type: Improvement > Components: hadoop3, test >Affects Versions: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.7.0 >Reporter: Istvan Toth >Assignee: Istvan Toth >Priority: Major > Labels: pull-request-available > > Discussed on the mailing list: > https://lists.apache.org/thread/orc62x0v2ktvj26ltvrqpfgzr94ncswn -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28860) Add a metric of the amount of data written to WAL to determine the pressure of replication
[ https://issues.apache.org/jira/browse/HBASE-28860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28860: --- Labels: pull-request-available (was: ) > Add a metric of the amount of data written to WAL to determine the pressure > of replication > -- > > Key: HBASE-28860 > URL: https://issues.apache.org/jira/browse/HBASE-28860 > Project: HBase > Issue Type: Improvement >Reporter: terrytlu >Priority: Major > Labels: pull-request-available > > Add a metric of the amount of data written to WAL to determine the pressure > of replication. > Combined with the replication shipped size metric, the user can determine how > many RegionServers are needed to meet the data(WAL) writing requirements, > that is to achieve the goal of no replication lag. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28845) table level wal appendSize and replication source metrics not correctly shown in /jmx response
[ https://issues.apache.org/jira/browse/HBASE-28845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HBASE-28845: --- Labels: pull-request-available (was: ) > table level wal appendSize and replication source metrics not correctly shown > in /jmx response > -- > > Key: HBASE-28845 > URL: https://issues.apache.org/jira/browse/HBASE-28845 > Project: HBase > Issue Type: Bug >Reporter: terrytlu >Assignee: terrytlu >Priority: Major > Labels: pull-request-available > Attachments: image-2024-09-18-11-21-10-279.png, > image-2024-09-18-11-21-20-295.png > > > Found 2 metrics did not display in the /jmx http interface response, table > level wal appendSize and table level replication source. > I suspect it's because the metric name contains a colon : > !image-2024-09-18-11-21-10-279.png|width=521,height=161! > > !image-2024-09-18-11-21-20-295.png|width=521,height=282! > > after modify the table name string to "Namespace_$namespace_table_$table, the > metric really display correctly in the /jmx response > -- This message was sent by Atlassian Jira (v8.20.10#820010)