Kadir Ozdemir created HBASE-29308:
-------------------------------------

             Summary: Reducing region unavailability during region movement
                 Key: HBASE-29308
                 URL: https://issues.apache.org/jira/browse/HBASE-29308
             Project: HBase
          Issue Type: Improvement
            Reporter: Kadir Ozdemir


Region movement is the process of transferring a region from one RegionServer 
to another where the region on the source RegionServer is closed and this 
region is opened on the target RegionServer. In the current design, the region 
is unavailable for the period of closing the region on the source RegionServer 
and then opening it on the target RegionServer. 

The main operations during region close include flushing MemStore, waiting for 
in-progress operations to complete (by acquiring the region operation lock 
exclusively), removing compacting files, and evicting the blocks in the block 
cache for the stores of the region. The operations for opening a region include 
reading the region info file, checking if there are any WAL files to replay, 
opening store files and reading metadata and possibly bloom filters. It is 
clear that executing these steps sequentially can take some time and prolong 
the region's unavailability.

Most of the above operations can be done outside (before or after) the region’s 
unavailability window. As described below, we actually need to include only 
flushing MemStore on the source RegionServer, and then loading the store files 
generated during this MemStore flush on the target RegionServer in the 
unavailability window. 

The region unavailability time can be reduced by introducing two new region 
state WARMING and MOVING as follows:
 # A new copy region is opened on the target RegionServer. This copy of the 
region is not visible to HMaster and clients yet. The region is set to be in 
state WARMING. In this state, it is not ready to serve reads or writes. The 
WARMING state is an in-memory state and not recorded in the meta table. The 
WARMING regions need to be cleaned up if the region move operation fails. If a 
region remains in the WARMING state longer than a specified timeout period, 
this operation can be executed locally on the target RegionServer after the 
timeout.
 # The next step is to put the region of the source RegionServer in the MOVING 
state. This will trigger MemStore flushing. In the MOVING state, the region 
will not accept new (read or write) operations but continue serving in-progress 
read (gets and scans) operations. Please note as part of snapshot isolation, 
these operations are allowed. This is essentially the initial part of the 
region CLOSING state where the MemStore is flushed.
 # When the region completes MemStore flushing, the target region is notified 
that new HFiles are created for the region. The target region loads these 
files, meaning that it  opens these files and reads its metadata. Then the 
region state (for the region in the target RegionServer) will change to OPEN 
and its location info will be updated with the target RegionServer in the meta 
table, and the HMaster node will be notified about this change. Thus, the 
region on the target RegionServer will be visible to the clients.
 # Finally, the region on the source RegionServer will be closed.

With this design, the region will be unavailable for new operations only for 
the period of flushing MemStore, loading store files generated by MemStore 
flushes, updating the meta table, and notifying HMaster. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to