[jira] [Created] (HBASE-27231) Make FSHLog retry writing WAL entries when syncs to HDFS failed.
chenglei created HBASE-27231: Summary: Make FSHLog retry writing WAL entries when syncs to HDFS failed. Key: HBASE-27231 URL: https://issues.apache.org/jira/browse/HBASE-27231 Project: HBase Issue Type: Improvement Components: wal Affects Versions: 3.0.0-alpha-4 Reporter: chenglei -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27230) RegionServer should abort when WAL.sync throws TimeoutIOException.
chenglei created HBASE-27230: Summary: RegionServer should abort when WAL.sync throws TimeoutIOException. Key: HBASE-27230 URL: https://issues.apache.org/jira/browse/HBASE-27230 Project: HBase Issue Type: Bug Components: wal Affects Versions: 3.0.0-alpha-4 Reporter: chenglei -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-27152) Under compaction mark may leak
[ https://issues.apache.org/jira/browse/HBASE-27152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-27152: --- This breaks several UTs. Revert for now. [~Xiaolin Ha] Please see the TestCompaction.testCompactionQueuePriorities. > Under compaction mark may leak > -- > > Key: HBASE-27152 > URL: https://issues.apache.org/jira/browse/HBASE-27152 > Project: HBase > Issue Type: Bug > Components: Compaction >Affects Versions: 3.0.0-alpha-2, 2.4.12 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-4 > > > HBASE-26249 introduced an under compaction mark to reduce repeatedly > compactions for the same stores for a short period of time after bulk loading > files. > Since the mark adding and removing are in difference threads, > {code:java} > pool.execute( > new CompactionRunner(store, region, compaction, tracker, completeTracker, > pool, user)); > if (LOG.isDebugEnabled()) { > LOG.debug( > "Add compact mark for store {}, priority={}, current under compaction " > + "store size is {}", > getStoreNameForUnderCompaction(store), priority, > underCompactionStores.size()); > } > underCompactionStores.add(getStoreNameForUnderCompaction(store)); {code} > it happens that the concurrently of thread-1 using > underCompactionStores.add() and thread-2 which running a CompactionRunner > that using underCompactionStores.remove(). If the removing happens before > adding, the mark will leak. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27229) BucketCache statistics should not count evictions by hfile
Bryan Beaudreault created HBASE-27229: - Summary: BucketCache statistics should not count evictions by hfile Key: HBASE-27229 URL: https://issues.apache.org/jira/browse/HBASE-27229 Project: HBase Issue Type: Improvement Reporter: Bryan Beaudreault The eviction metrics are helpful for determining how much access-related churn there is in your block cache. The LruBlockCache properly ignores evictions by invalidation of HFile, but the BucketCache tracks them. We should make the BucketCache work similarly so that one can get an accurate view of the evictions occurring through too much random access or too large working set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27228) Client connection warming API
Bryan Beaudreault created HBASE-27228: - Summary: Client connection warming API Key: HBASE-27228 URL: https://issues.apache.org/jira/browse/HBASE-27228 Project: HBase Issue Type: Improvement Reporter: Bryan Beaudreault In a high performance API or low latency stream workers, you often do not want to incur costs on the first few requests. In these cases, you want to warm connections before ever adding to the load balancer or processing group. Upon first creating a Connection, there are two areas that can slow down the first few requests: * Fetching region locations * Creating the initial connection to each RegionServer, which sends connection headers, possibly does auth handshakes, etc. A user can easily work around the first slowness by calling Table.getRegionLocator().getAllRegionLocations(). It's more challenging for a user to warm the actual RegionServer connections. One way we have done this is to use a RegionLocator to fetch all locations for a table, reduce that down to 1 region per server, and then issue a Get to each row. We end up repeating this for every table that a process may connect to, because at the level we do this we can't easily tell which servers have already been warmed. We also have run into various bugs over time, for example where an empty startkey causes a Get to fail. We can make this easier for the users by providing an API which uses Connection internals to as cheaply as possible warm these connections. I'd propose we add the following: New Table/AsyncTable method {{{}warmConnections(){}}}. This would do the following: * use region locator to fetch all locations (with caching) * reduce returned locations to unique ServerNames * for each ServerName (with lock): ** if already warmed, skip ** otherwise, get a connection to that server and send an initial request to trigger socket creation/connection header/etc With this API, if someone is connecting to multiple tables, they could warm each of them Table in parallel and we'd only create connections to each server once. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27227) Long running heavily filtered scans hold up too many ByteBuffAllocator buffers
Bryan Beaudreault created HBASE-27227: - Summary: Long running heavily filtered scans hold up too many ByteBuffAllocator buffers Key: HBASE-27227 URL: https://issues.apache.org/jira/browse/HBASE-27227 Project: HBase Issue Type: Improvement Reporter: Bryan Beaudreault We have a workload which is launching long running scans searching for a needle in a haystack. They have a timeout of 60s, so are allowed to run on the server for 30s. Most of the rows are filtered, and the final result is usually only a few kb. When these scans are running, we notice our ByteBuffAllocator pool usage goes to 100% and we start seeing 100+ MB/s of heap allocations. When the scans finish, the pool goes back to normal and heap allocations go away. My working theory here is that we are only releasing ByteBuff's once we call {{shipper.shipped(),}} which only happens once a response is returned to the user. This works fine for normal scans which are likely to quickly find enough results to return, but for long running scans in which most of the results are filtered we end up holding on to more and more buffers until the scan finally returns. We should consider whether it's possible to release buffers for blocks whose cells have been completely skipped by a scan. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27226) Document native TLS support in Netty RPC
Andor Molnar created HBASE-27226: Summary: Document native TLS support in Netty RPC Key: HBASE-27226 URL: https://issues.apache.org/jira/browse/HBASE-27226 Project: HBase Issue Type: Task Components: documentation Reporter: Andor Molnar Assignee: Andor Molnar Add a new section to the HBase book on how a developer can get this going. Should include * relevant TLS properties added in X509Util.java which need to be added to hbase-site.xml * how to generate a self-signed CA and certs using {{{}keytool{}}}/{{{}openssl{}}} * any known limitations -- This message was sent by Atlassian Jira (v8.20.10#820010)