[jira] [Created] (HBASE-27231) Make FSHLog retry writing WAL entries when syncs to HDFS failed.

2022-07-20 Thread chenglei (Jira)
chenglei created HBASE-27231:


 Summary: Make FSHLog retry writing WAL entries when syncs to HDFS 
failed.
 Key: HBASE-27231
 URL: https://issues.apache.org/jira/browse/HBASE-27231
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 3.0.0-alpha-4
Reporter: chenglei






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27230) RegionServer should abort when WAL.sync throws TimeoutIOException.

2022-07-20 Thread chenglei (Jira)
chenglei created HBASE-27230:


 Summary: RegionServer should abort when WAL.sync throws 
TimeoutIOException.
 Key: HBASE-27230
 URL: https://issues.apache.org/jira/browse/HBASE-27230
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 3.0.0-alpha-4
Reporter: chenglei






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27152) Under compaction mark may leak

2022-07-20 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HBASE-27152:
---

This breaks several UTs.

Revert for now.

[~Xiaolin Ha] Please see the TestCompaction.testCompactionQueuePriorities.

> Under compaction mark may leak
> --
>
> Key: HBASE-27152
> URL: https://issues.apache.org/jira/browse/HBASE-27152
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 3.0.0-alpha-2, 2.4.12
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> HBASE-26249 introduced an under compaction mark to reduce repeatedly 
> compactions for the same stores for a short period of time after bulk loading 
> files.
> Since the mark adding and removing are in difference threads,
> {code:java}
> pool.execute(
>   new CompactionRunner(store, region, compaction, tracker, completeTracker, 
> pool, user));
> if (LOG.isDebugEnabled()) {
>   LOG.debug(
> "Add compact mark for store {}, priority={}, current under compaction "
>   + "store size is {}",
> getStoreNameForUnderCompaction(store), priority, 
> underCompactionStores.size());
> }
> underCompactionStores.add(getStoreNameForUnderCompaction(store)); {code}
>  it happens that the concurrently of thread-1 using 
> underCompactionStores.add() and thread-2 which running a CompactionRunner 
> that using underCompactionStores.remove(). If the removing happens before 
> adding, the mark will leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27229) BucketCache statistics should not count evictions by hfile

2022-07-20 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-27229:
-

 Summary: BucketCache statistics should not count evictions by hfile
 Key: HBASE-27229
 URL: https://issues.apache.org/jira/browse/HBASE-27229
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


The eviction metrics are helpful for determining how much access-related churn 
there is in your block cache. The LruBlockCache properly ignores evictions by 
invalidation of HFile, but the BucketCache tracks them. 

We should make the BucketCache work similarly so that one can get an accurate 
view of the evictions occurring through too much random access or too large 
working set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27228) Client connection warming API

2022-07-20 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-27228:
-

 Summary: Client connection warming API
 Key: HBASE-27228
 URL: https://issues.apache.org/jira/browse/HBASE-27228
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


In a high performance API or low latency stream workers, you often do not want 
to incur costs on the first few requests. In these cases, you want to warm 
connections before ever adding to the load balancer or processing group.

Upon first creating a Connection, there are two areas that can slow down the 
first few requests:
 * Fetching region locations
 * Creating the initial connection to each RegionServer, which sends connection 
headers, possibly does auth handshakes, etc.

A user can easily work around the first slowness by calling 
Table.getRegionLocator().getAllRegionLocations().

It's more challenging for a user to warm the actual RegionServer connections. 
One way we have done this is to use a RegionLocator to fetch all locations for 
a table, reduce that down to 1 region per server, and then issue a Get to each 
row. We end up repeating this for every table that a process may connect to, 
because at the level we do this we can't easily tell which servers have already 
been warmed. We also have run into various bugs over time, for example where an 
empty startkey causes a Get to fail.

We can make this easier for the users by providing an API which uses Connection 
internals to as cheaply as possible warm these connections. I'd propose we add 
the following:

New Table/AsyncTable method {{{}warmConnections(){}}}. This would do the 
following:
 * use region locator to fetch all locations (with caching)
 * reduce returned locations to unique ServerNames
 * for each ServerName (with lock):
 ** if already warmed, skip
 ** otherwise, get a connection to that server and send an initial request to 
trigger socket creation/connection header/etc

With this API, if someone is connecting to multiple tables, they could warm 
each of them Table in parallel and we'd only create connections to each server 
once. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27227) Long running heavily filtered scans hold up too many ByteBuffAllocator buffers

2022-07-20 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HBASE-27227:
-

 Summary: Long running heavily filtered scans hold up too many 
ByteBuffAllocator buffers
 Key: HBASE-27227
 URL: https://issues.apache.org/jira/browse/HBASE-27227
 Project: HBase
  Issue Type: Improvement
Reporter: Bryan Beaudreault


We have a workload which is launching long running scans searching for a needle 
in a haystack. They have a timeout of 60s, so are allowed to run on the server 
for 30s. Most of the rows are filtered, and the final result is usually only a 
few kb.

When these scans are running, we notice our ByteBuffAllocator pool usage goes 
to 100% and we start seeing 100+ MB/s of heap allocations. When the scans 
finish, the pool goes back to normal and heap allocations go away.

My working theory here is that we are only releasing ByteBuff's once we call 
{{shipper.shipped(),}} which only happens once a response is returned to the 
user. This works fine for normal scans which are likely to quickly find enough 
results to return, but for long running scans in which most of the results are 
filtered we end up holding on to more and more buffers until the scan finally 
returns.

We should consider whether it's possible to release buffers for blocks whose 
cells have been completely skipped by a scan.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27226) Document native TLS support in Netty RPC

2022-07-20 Thread Andor Molnar (Jira)
Andor Molnar created HBASE-27226:


 Summary: Document native TLS support in Netty RPC
 Key: HBASE-27226
 URL: https://issues.apache.org/jira/browse/HBASE-27226
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Andor Molnar
Assignee: Andor Molnar


Add a new section to the HBase book on how a developer can get this going. 
Should include
 * relevant TLS properties added in X509Util.java which need to be added to 
hbase-site.xml
 * how to generate a self-signed CA and certs using 
{{{}keytool{}}}/{{{}openssl{}}}
 * any known limitations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)