Kadir Ozdemir created HBASE-29308: ------------------------------------- Summary: Reducing region unavailability during region movement Key: HBASE-29308 URL: https://issues.apache.org/jira/browse/HBASE-29308 Project: HBase Issue Type: Improvement Reporter: Kadir Ozdemir
Region movement is the process of transferring a region from one RegionServer to another where the region on the source RegionServer is closed and this region is opened on the target RegionServer. In the current design, the region is unavailable for the period of closing the region on the source RegionServer and then opening it on the target RegionServer. The main operations during region close include flushing MemStore, waiting for in-progress operations to complete (by acquiring the region operation lock exclusively), removing compacting files, and evicting the blocks in the block cache for the stores of the region. The operations for opening a region include reading the region info file, checking if there are any WAL files to replay, opening store files and reading metadata and possibly bloom filters. It is clear that executing these steps sequentially can take some time and prolong the region's unavailability. Most of the above operations can be done outside (before or after) the region’s unavailability window. As described below, we actually need to include only flushing MemStore on the source RegionServer, and then loading the store files generated during this MemStore flush on the target RegionServer in the unavailability window. The region unavailability time can be reduced by introducing two new region state WARMING and MOVING as follows: # A new copy region is opened on the target RegionServer. This copy of the region is not visible to HMaster and clients yet. The region is set to be in state WARMING. In this state, it is not ready to serve reads or writes. The WARMING state is an in-memory state and not recorded in the meta table. The WARMING regions need to be cleaned up if the region move operation fails. If a region remains in the WARMING state longer than a specified timeout period, this operation can be executed locally on the target RegionServer after the timeout. # The next step is to put the region of the source RegionServer in the MOVING state. This will trigger MemStore flushing. In the MOVING state, the region will not accept new (read or write) operations but continue serving in-progress read (gets and scans) operations. Please note as part of snapshot isolation, these operations are allowed. This is essentially the initial part of the region CLOSING state where the MemStore is flushed. # When the region completes MemStore flushing, the target region is notified that new HFiles are created for the region. The target region loads these files, meaning that it opens these files and reads its metadata. Then the region state (for the region in the target RegionServer) will change to OPEN and its location info will be updated with the target RegionServer in the meta table, and the HMaster node will be notified about this change. Thus, the region on the target RegionServer will be visible to the clients. # Finally, the region on the source RegionServer will be closed. With this design, the region will be unavailable for new operations only for the period of flushing MemStore, loading store files generated by MemStore flushes, updating the meta table, and notifying HMaster. -- This message was sent by Atlassian Jira (v8.20.10#820010)