Viraj Jasani created HBASE-26466:
------------------------------------

             Summary: Immutable timeseries usecase - Create new region rather 
than split existing one
                 Key: HBASE-26466
                 URL: https://issues.apache.org/jira/browse/HBASE-26466
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Viraj Jasani


For insertion of immutable data usecase (specifically time-series data), region 
split mechanism doesn't seem to provide better availability when ingestion rate 
is very high. When we ingest lot of data, the region split policy tries to 
split the given hot region based on the size (either size of all stores 
combined or size of any single store exceeding max file size configured) if we 
consider default {_}SteppingSplitPolicy{_}. The latest hot regions tend to 
receive all latest inserts. When the region is split, the first half of the 
region (say daughterA) stays on the same server whereas the second half 
(daughterB) region – likely to become another hot region because all new latest 
updates come to second half region in the sequential write fashion – is moved 
out to other servers in the cluster. Hence, once new daughter region is 
created, client traffic will be redirected to another server. Client requests 
will be piled up when region split is triggered till new daughters come alive 
and once done, client will have to request meta for updated daughter region and 
redirect traffic to new server.

If we could have configurable region creation strategy that 1) keeps the split 
disabled for the given table, and 2) create new region dynamically with 
lexicographically higher start key on the same server and update it's own 
region boundary, the client will have to look up meta once and continue 
ingestion without any degraded SLA caused by region split transitions.

Note: region split might also encounter some complications, requiring the 
procedure to be rolled back from some step, or continue with internal retries, 
eventually further delaying the ingestion from clients.

 

There are some complications around updating live region's start and end keys 
as this key range is immutable. We could brainstorm ideas around making them 
optionally mutable and any issues around them. For instance, client might 
continue writing data to the region with updated end key but writes will fail 
and hence, they will lookup in meta for updated key-space range of the table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to