Kadir Ozdemir created HBASE-29308:
-------------------------------------
Summary: Reducing region unavailability during region movement
Key: HBASE-29308
URL: https://issues.apache.org/jira/browse/HBASE-29308
Project: HBase
Issue Type: Improvement
Reporter: Kadir Ozdemir
Region movement is the process of transferring a region from one RegionServer
to another where the region on the source RegionServer is closed and this
region is opened on the target RegionServer. In the current design, the region
is unavailable for the period of closing the region on the source RegionServer
and then opening it on the target RegionServer.
The main operations during region close include flushing MemStore, waiting for
in-progress operations to complete (by acquiring the region operation lock
exclusively), removing compacting files, and evicting the blocks in the block
cache for the stores of the region. The operations for opening a region include
reading the region info file, checking if there are any WAL files to replay,
opening store files and reading metadata and possibly bloom filters. It is
clear that executing these steps sequentially can take some time and prolong
the region's unavailability.
Most of the above operations can be done outside (before or after) the region’s
unavailability window. As described below, we actually need to include only
flushing MemStore on the source RegionServer, and then loading the store files
generated during this MemStore flush on the target RegionServer in the
unavailability window.
The region unavailability time can be reduced by introducing two new region
state WARMING and MOVING as follows:
# A new copy region is opened on the target RegionServer. This copy of the
region is not visible to HMaster and clients yet. The region is set to be in
state WARMING. In this state, it is not ready to serve reads or writes. The
WARMING state is an in-memory state and not recorded in the meta table. The
WARMING regions need to be cleaned up if the region move operation fails. If a
region remains in the WARMING state longer than a specified timeout period,
this operation can be executed locally on the target RegionServer after the
timeout.
# The next step is to put the region of the source RegionServer in the MOVING
state. This will trigger MemStore flushing. In the MOVING state, the region
will not accept new (read or write) operations but continue serving in-progress
read (gets and scans) operations. Please note as part of snapshot isolation,
these operations are allowed. This is essentially the initial part of the
region CLOSING state where the MemStore is flushed.
# When the region completes MemStore flushing, the target region is notified
that new HFiles are created for the region. The target region loads these
files, meaning that it opens these files and reads its metadata. Then the
region state (for the region in the target RegionServer) will change to OPEN
and its location info will be updated with the target RegionServer in the meta
table, and the HMaster node will be notified about this change. Thus, the
region on the target RegionServer will be visible to the clients.
# Finally, the region on the source RegionServer will be closed.
With this design, the region will be unavailable for new operations only for
the period of flushing MemStore, loading store files generated by MemStore
flushes, updating the meta table, and notifying HMaster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)